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This  dissertation  describes  a computer  vision  approach  for  guiding 
the  blind.  The  design  concepts  of  such  a vision  system  are  specified, 
and  new  image  techniques  for  the  interpretation  of  three-dimensional 
real-world  scenes  are  developed. 

In  the  development  of  these  image  techniques,  we  first  investigate 
the  mapping  principles  which  relate  the  two-dimensional  image  with  the 
three-dimensional  real  world. 

A simple  but  efficient  decomposition  of  the  complex  real  world  into 
a modular  structure  is  presented.  In  this  structure,  we  identify  the 
problems  most  pertinent  for  the  safe  and  enhanced  guidance  of  the  blind. 
These  problems  are  planning  a safe  path  for  the  blind,  identifying 
shadows,  detecting  depressions,  and  discriminating  upright  objects  from 
flat  objects.  To  each  of  these  problems,  we  specify  their  characteristic 
visual  features,  and  develop  the  appropriate  image  techniques  which 
exploit  these  features  in  an  effective  and  original  fashion. 


vi 


By  means  of  experimental  evaluation,  mathematical  derivations,  or  a 
combination  of  both,  we  demonstrate  the  soundness  of  these  image 
techniques,  which  are  in  harmony  with  the  visual  cues  used  by  humans. 
Computer  results  presented  in  this  dissertation  prove  this  assertion. 
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CHAPTER  I 
INTRODUCTION 

During  the  past  half  century,  considerable  efforts  have  been 
devoted  towards  improving  the  hazardous  and  discouraging  world  of  the 
blind.  The  needs  of  the  blind  have  been  defined  and  explored  across  the 
spectrum,  from  the  social  to  the  scientific  and  technological.  Notable 
advances  have  been  made  in  the  areas  of  social  adjustment,  vocational 
rehabilitation,  and  in  the  design  of  communication  and  learning  devices 
[1,2].  However,  little  progress  has  been  made  in  the  effort  to  provide 
guidance  aids  for  the  blind.  Recent  scientific  and  technological 
breakthroughs  include  visual  techniques  in  robotics  with  a fraction  of  a 
millimeter  (mm)  in  positioning  accuracy,  automatic  guidance  of  missiles 
and  aircrafts  with  extreme  precision,  and  remote  sensing  and  information 
gathering.  Yet  the  mobility  needs  of  the  blind  are  still  met  by  the 
traditional  cane  and  guide  dog.  In  the  words  of  Paul  A.  Zahl  [3],  "A 
civilization  with  such  skills  should  be  able  to  develop  guidance  aids 
for  the  blind  more  knowing  than  the  cane,  more  dependable  than  the  dog" 
(page  443). 

The  design  of  guidance  aids  for  the  blind  can  be  pursued  along  two 
major  directions  [4]:  (1)  enhancing  the  information  processing 

capability  of  other  sensory  organs,  and  (2)  providing  the  blind  with 
limited  artificial  seeing  capability.  Enhancement  of  the  information 
processing  capability  of  other  sensory  organs  may  be  accomplished  either 
by  training  the  person  to  cope  with  problems  created  by  the  loss  of 
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vision,  or  by  designing  devices  to  convert  primitive  visual  information 
or  reflected  echos  from  transmitted  signals  into  either  tactile  or 
auditory  information.  A majority  of  the  research  and  development  work 
on  guidance  aids  for  the  blind  in  the  past  two  decades  has  focused  on 
this  area.  To  provide  the  blind  with  limited  artificial  seeing 

capability  requires  the  design  of  vision  systems  that  are  capable  of 
analyzing  and  interpreting  real-world  scenes.  These  systems  must  yield 
descriptions  suitable  for  the  blind  that  can  provide  safe  guidance  and 
enhance  mobility.  This  research  direction,  which  seems  a natural  and 
promising  way  to  solve  the  given  problem  of  the  design  of  guidance  aids, 
has  received  much  less  attention  than  it  deserves.  The  work  presented 
in  this  dissertation  pursues  this  last  described  research  direction. 

There  are  two  main  objectives  sought  in  this  dissertation.  The 
first  objective  is  to  stress  the  feasibility  of  this  research  direction, 
in  a global  sense,  by  presenting  a complete  solution  to  the  problem  from 
the  design  aspects  to  the  development  of  the  necessary  image 
techniques.  The  second  and  more  specific  objective  is  to  transcend  the 
guidance  schemes  of  the  past  and  develop  new  image  techniques  for 
enhanced  image  interpretation  that  lead  to  improved  mobility  of  the 
bl i nd. 

Chapter  II  presents  a survey  on  guidance  aids  for  the  blind.  This 
survey  includes  some  computer  vision-based  guidance  systems  which  are 
not  designed  specifically  for  the  blind,  but  are  relevant  to  our  study 

since  they  attempt  to  understand  and  negotiate  their  way  in  real-world 
envi ronments . 

Chapter  III  investigates  the  design  requirements  of  a computer 
vision  system  aimed  at  guiding  the  blind.  The  organizing  principles 
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guiding  the  process  of  analysis  and  interpretation  of  real-world  scenes 
are  established.  A functional  structure  linking  the  various  image 
techniques  to  yield  the  integrated  vision  system  is  described.  Since 
subsequent  chapters  deal  with  the  analysis  of  two-dimensional  (2-D) 
digitized  images,  geometrical  formulae  establishing  the  mapping 

principles  from  a 2-D  digitized  image  to  a three-dimensional  (3-D)  real- 
world  representation  are  derived. 

The  overall  process  of  interpreting  2-D  images  of  3-D  real-world 
environments  is  composed  of  two  modes  of  analysis.  Mode  1 consists  of 
image  techniques  which  operate  directly  on  the  digitized  image  in  order 
to  minimize  processing  time.  These  are  image  techniques  which  determine 
the  safety  path  for  the  blind,  a process  we  refer  to  as  safety  path 
planning;  image  techniques  which  identify  shadows;  and  image  techniques 
which  detect  depressions.  Mode  2 consists  of  image  techniques  which 
require  the  additional  processing  step  of  either  binary  image  generation 
(a  process  which  extracts  object(s)  from  ground),  or  edge  detection  (a 
process  which  produces  an  outline  of  the  viewed  scene).  These  are  image 
techniques  which  discriminate  upright  objects  from  flat  ones,  and  image 
techniques  which  provide  primitive  descriptions  of  detected  objects. 
The  need  for  mode  2 of  analysis  rises  from  the  fact  that  some  image 
features  are  easier  to  extract  from  either  a binary  image  or  an  outline 
of  the  image  than  from  a digitized  image.  Moreover,  this  overall 
process  is  designed  such  that  an  image  technique  in  one  mode  can  call 
upon  an  image  technique  in  the  other  mode  for  assistance  in  the 
interpretation  of  a given  scene. 

In  Chapter  IV,  three  image  techniques  which  support  the  safety  path 
planning  are  described.  The  first  technique  is  called  the  wide-angle 
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view  technique.  The  objective  of  the  wide-angle  view  technique  is  to 
acquire  as  great  an  area  as  possible  by  combining  left,  front  and  right 
images  of  the  environment  from  a given  viewpoint  for  the  purpose  of  an 
optimal  tracing  of  the  safety  path.  This  technique  is  particularly 
important  in  both  the  initial  state  of  the  guidance  process,  and 
whenever  a decision  to  change  the  direction  of  travel  is  made.  The 
safety  path  from  the  above  three  images  and  all  subsequent  images  are 
generated  using  the  second  image  technique  which  is  called  the  first- 
pass  evaluation.  This  first-pass  evaluation  technique  is  a systematic 
assessment  of  the  gray  level  (intensity)  variations  of  the  image  and  the 
objective  is  to  find  the  image  areas  with  surfaces  consistent  with  an 
initial  area  found  to  be  obstacle-free.  Essential  safety  path  cues  such 
as  safe  step,  obstacle  ahead,  turn  left/right  are  provided  in  real- 
time. However,  a global  interpretation  of  the  scene  would  require  the 
additional  processing  time  needed  to  execute  the  necessary  software 
routines.  This  aspect  of  the  problem  would  ultimately  require  a 
solution  that  involved  concurrent  processing.  The  third  technique,  the 
second-pass  evaluation,  is  devised  to  provide  a primitive  description  of 
the  object(s)  detected  by  the  first-pass  evaluation.  This  third 

techique,  which  is  contained  in  mode  2,  operates  on  the  binary  image. 

It  is  difficult  to  conceive  of  a computer  vision-based  guidance 
system  analyzing  images  of  real-world  scenes,  without  defining  the 
principles  underlying  the  image  interpretation  of  such  problems  as 
shadows,  depressions  or  drops,  and  upright  objects  versus  flat  lying 
ones.  In  the  reported  literature,  no  serious  attempts  have  been  made  to 
understand  these  problems.  Often,  they  are  simply  circumvented  through 
the  imposition  of  ideal  settings.  Certainly,  these  problems  represent 
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intricate  image  analysis  tasks,  but  they  also  constitute  an  essential, 
if  not  integral  part  in  the  design  of  a computer  vision-based  guidance 
system.  Chapter  V develops  new  image  techniques  as  a solution  to  these 
problems. 

In  Chapter  VI,  an  analysis  of  the  system  errors  is  carried.  Both 
the  error  due  to  deviations  associated  with  system  motion,  and  the  error 
in  estimating  the  range  are  evaluated. 

Chapter  VII  summarizes  the  major  accomplishments  reported  in  this 
dissertation,  and  provides  suggestions  for  future  research. 


CHAPTER  II 

GUIDANCE  AIDS  FOR  THE  BLIND 


As  early  as  1944,  a National  Research  Council  [3]  was  established 
to  assess  the  various  needs  of  the  blind.  In  this  assessment,  it  was 
recognized  that  there  was  a desperate  need  for  research  and  development 
of  sensory  devices  to  help  the  blind  in  their  daily  mobility  needs.  In 
four  years,  the  efforts  of  this  committee  generated  relevant  information 
on  possible  research  avenues,  several  pilot  instruments,  and  some 
preliminary  results.  In  a summary  of  progress,  a member  of  the 
committee  stated  that,  "With  the  information  thus  gained,  undoubtedly 
the  physicists  and  engineers  will  want  to  have  the  instruments  in  hand 
again  for  another  cycle  of  redesigning  and  improvement,  and  of 
retesting"  (page  433).  However,  even  with  today's  space-age  and 
information-age  technology,  providing  practical  means  to  aid  the  blind 
in  their  search  for  relatively  easy  and  efficient  mobility  is  a problem 
that  is  yet  to  be  resolved. 

A survey  on  the  various  research  and  development  studies  on 
guidance  aids  is  given  below.  Most  of  these  studies  follow  familiar 
lines  of  research  with  the  goal  of  advancing  earlier  concepts.  Few  have 
taken  up  new  research  topics. 

2.1  Early  Concepts  and  Resulting  Designs 

Early  concepts  that  led  to  the  design  of  guidance  devices  for  the 
blind  can  be  classified  into  three  main  categories:  (1)  devices  which 
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make  use  of  electromagnetic  waves  [5-7];  (2)  devices  which  make  use  of 
ultrasonic  waves  [8-12];  and  (3)  the  more  recent  tactile  vision  devices, 
which  transform  visual  information  into  tactile  information  [13-16]. 

The  principle  of  electromagnetic  and  ultrasonic  devices  is  based 

upon  emitting  electromagnetic  or  ultrasonic  signals  and  detecting  and 
decoding  that  portion  of  the  signal  which  is  reflected  by  encounteri ng 
objects.  Most  of  these  devices  are  simple  designs  with  varying  degrees 
of  practicality.  Some  of  them  are  mere  obstacle  detectors,  others,  with 
added  sophistication  can  provide  range  information  and  even  some 
primitive  features  of  the  object  such  as  texture.  Certainly,  these 

devices  have  their  practical  uses,  but  to  serve  as  guidance  aids  they 
must  first  overcome  the  following  drawbacks:  (1)  environmental  sensing 

capability  is  limited  to  short  range  objects,  and  objects  which  are 

small  or  thin  and  obstacles  such  as  curbs  may  not  be  detected;  (2)  it  is 
difficult  to  make  any  spatial  judgement  about  the  detected  object(s) 

with  the  information  extracted  from  the  reflected  signal;  (3)  not  all 
objects  reflect  the  transmitted  signal,  and  weak  reflections  may  not  be 
detected  due  to  outdoor  noise;  and  (4)  directionality  is  often  reduced 
to  going  from  one  detected  obstacle  to  another. 

The  tactile  vision  devices  convert  visual  information  into  tactile 
information.  The  visual  information,  a digitized  image  of  the  camera  is 
mapped,  point  for  point,  into  an  array  of  vibrotactile  stimulators  in 
contact  with  the  blind.  The  degree  of  stimulation  is  a function  of  the 
gray  level  intensities  of  the  image.  This  information  is  reduced, 
however,  to  ON  and  OFF  type  of  information  where,  the  ON  stimulators 
indicate  the  presence  of  the  object  and  the  OFF  stimulators  characterize 
the  background.  These  devices  are  still  in  the  experimental  stage  and 
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suffer  from  various  problems  which  have  yet  to  be  resolved.  These  are 
(1)  biophysical  difficulties  related  to  the  limited  form-sensing  and 
spatial  resolution  of  the  skin,  and  to  the  crosstalk  between  stimuli 
[17];  (2)  complex  images  yield  complex  tactile  patterns  which  exceed  the 
perceptual  integration  capability  of  the  skin,  thus,  making  the  tactile 
patterns  extremely  difficult  to  interpret;  (3)  there  is  no  distinct  way 
to  extract  the  range  information;  and  (4)  only  objects  very  near  the 
blind  may  be  recognized. 

2 • 2 Recent  Research  Studies  and  Developments 

The  considerable  progress  made  in  the  past  in  computer  vision  and 
image  processing  applications  together  with  the  improvements  in  size  and 
speed  of  computers  have  led  to  a new  research  direction  in  guidance  aids 
called  the  computer  vision  approach.  Along  this  line  of  research, 
various  studies  have  been  reported  [18-23].  Although  only  Deering's 
work  [18]  deals  specifically  with  guidance  for  the  blind,  the  central 
theme  of  all  these  studies  is  still  that  of  processing  and  interpreting 

2-D  images  of  real-world  scenes  for  the  purpose  of  achieving  autonomous 
guidance. 

The  computer  vision  system  described  in  Deering's  study  [18]  is 
developed  to  guide  the  blind  through  city  sidewalks.  Image 

interpretation  is  based  on  extracting  edges  of  a viewed  scene  using  an 
edge  operator  and  matching  these  edges  to  a sidewalk  model  that  is 
stored  apriori  in  a database.  With  the  results  of  the  edge-matching 
process,  the  system  orients  itself  through  the  sidewalk,  and  detects  and 
avoids  large  objects  within  this  sidewalk.  To  ease  the  matching 

process,  the  sidewalk  scene  is  assumed  clean  and  shadow-free.  The  main 
problems  of  this  study  are  the  following: 
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(1)  Since  a computer  model  of  the  sidewalk  is  used,  the 
vision  system  will  not  adapt  to  sidewalks  which  do  not 
fit  the  model . 

(2)  It  is  assumed  that  the  borders  of  the  sidewalk  and  the 
majority  of  objects  within  it  have  a high  contrast  with 
respect  to  the  rest  of  the  scene.  This,  however,  is  not 
always  true,  especially  in  noisy  outdoor  scenes.  Thus, 
if  this  assumption  fails,  edges  of  low  contrast  objects, 
perhaps  obstacles,  may  not  be  extracted  resulting  in 
objects  (obstacles)  being  left  undetected. 

(3)  There  is  no  mention  of  the  approach  to  be  taken  in  the 
event  the  sidewalks  are  not  shadow-free. 

This  work,  nonetheless,  has  great  merit.  It  has  shown  the  feasibility 
of  such  a design,  and  has  implemented  in  real  time  the  edge  detection 
and  matching  process  for  the  identification  of  city  sidewalks. 

The  research  studies  reported  in  these  studies  [19-23]  use  either  a 
roving  robot  or  an  autonomous  vehicle  as  the  subject.  It  should  be 
clear  that  what  can  be  achieved  by  these  systems  in  terms  of  image 
understanding  techniques  can  be  readily  extended  to  vision  systems  for 
the  blind.  Of  these  systems,  the  one  with  perhaps  the  strongest  impact 
on  the  area  of  computer  vision  systems  design  is  the  work  by  Moravec 
[19].  This  study  details  both  the  hardware  and  software  modules  of  the 
system.  The  vision  system  approach  is  to  model  all  objects  contained  in 
the  scene  into  circles  which  circumscribe  them.  The  roving  robot  itself 
is  modeled  as  a circle  with  a given  diameter.  The  guidance  problem  is 
then  reduced  to  tracing  for  the  robot  the  shortest  path  to  a given 
destination,  through  the  obstacles.  An  elaborate,  well  controlled 
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scheme  is  used  to  recover  the  range  information  from  a set  of  images. 
To  this  important  study,  we  attribute  the  following  contributions:  (1) 

development  of  an  interest  operator/correl ator  capable  of  extracting  the 
range  information  from  a series  of  2-D  images;  (2)  design  of  a camera 
calibration  process  which  determines  and  corrects  distortion  effects; 
and  (3)  implementation  of  an  algorithm  which  finds  the  shortest  path  to 
a destination  through  relatively  large  objects.  The  methods  used  are 
recognized  as  crude,  resulting  in  the  slow  processing  time  of  10  to  15 
minutes  before  the  robot  is  directed  to  move  0.75  meters  (m),  but  with 
added  research  [20],  the  methods  can  be  improved.  However,  one  major 
drawback  associated  with  this  study  is  the  fact  that  the  image 
interpretation  is  again  constrained  to  ideal  settings,  and  the  overall 
process  does  not  rise  above  the  obstacle  detection  and  avoidance 
probl em. 

Another  interesting  study  is  reported  by  Gennery  [21].  In  this 

study,  emphasis  is  placed  on  the  recovery  of  the  third  dimension  from 
stereo  images.  The  main  objective  is  to  separate  objects  from  the 
ground  surface.  It  is  assumed  that,  in  small  subdivisions,  the  ground 
surface  can  be  approximated  by  a plane  or  a paraboloid.  Using  a set 
threshold,  a high  resolution  correlator  is  used  to  determine  the  image 
points  which  are  likely  to  belong  to  objects.  These  points  are  then 

approximated  by  ellipsoids.  Through  an  iterative  least  squares  method, 
these  ellipsoids  are  either  broken  or  merged  to  yield  the  best 

segmentation  or  the  separation  of  objects  from  the  ground  surface. 

Furthermore,  with  the  stereo  image  data,  a statistical  correlation  of 
the  image  points  determines  the  position  and  shape  of  the  ground 
surface.  The  methods  used  in  this  study  are  error-prone  and  need  to  be 
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perfected,  but  then,  the  problems  they  address,  namely,  the 
discrimination  of  objects  from  ground  surface  and  the  determination  of 
the  ground  surface  orientation,  are  very  difficult  problems.  With 
continued  research  in  this  direction,  these  methods  could  prove  very 
useful  to  computer  vision-based  guidance  systems. 

The  work  by  Inigo  et  al.  [22]  pertains  to  the  autonomous  guidance 
of  a mobile  robot.  The  scene  used  is  that  of  an  arcade.  The  guidance 
process  is  based  on  tracking  one  edge  of  the  arcade.  The  edges  are 
found  using  the  Sobel  operator  supplemented  by  a Hough  transform  method 
to  recover  missing  edge  points.  The  detection  of  obstacles  is  based  on 
evaluating  the  shift  in  the  edges  of  the  stereo  images  which  are 
projected  in  the  (x,y)  coordinate  system.  A major  requirement  for  the 
proposed  system  is  that  the  boundaries  of  the  pathway  are  well 
defined.  As  in  Deering's  study  [18],  we  note  again  that  the  environment 
is  constrained,  in  this  case  to  an  arcade  with  an  ideal  path,  and  the 
edges  are  assumed  to  be  extractable.  Furthermore,  the  obstacle 
detection  scheme  has  been  used  for  a scene  that  is  assumed  to  contain 
only  one  object. 

Using  a similar  approach  as  in  the  study  of  Inigo  et  al . [22],  the 
work  by  Tsugawa  et  al.  [23]  describes  the  guidance  of  an  autonomous 
vehicle  equipped  with  stereo  vision.  The  process  consists  of  a road 
pattern  recognition  unit  and  a problem  solving  unit.  The  road  is 
classified  by  the  following  terms:  follow  course,  turn  right,  turn 

left,  obstacle  on  road,  and  dead-end.  The  obstacles  used  are  real 
obstacles  with  some  illusion  added.  Through  image  segmentation,  the 
vehicle  orients  itself  within  the  roadway,  and  avoids  encountered 
obstacles.  Again,  we  see  the  problem  of  guidance  being  reduced  to  edge 


12 


tracking  and  to  the  avoidance  of  obstacles  which  in  this  case  are 
characterized  by  the  added  illusion.  A major  achievement  of  this  study 

is  orienting  the  vehicle  within  the  roadway  at  the  relatively  high  speed 
of  30  km/hr. 


2.3  Pi scussi on 

In  this  survey,  we  note  that  the  early  devices  offer  limited 
information  about  the  real  world,  and  the  result  is  at  best  the 
provision  of  obstacle  avoidance  cues.  On  the  other  hand,  recent  vision 
systems  which  are  capable  of  extracting  rich  information  about  the  real 
world  have,  for  the  most  part,  reduced  this  information  to  yield  limited 
interpretation  results.  This  information  is  reduced  either  directly,  by 
limiting  the  environment  to  an  ideal  setting,  or  indirectly,  by 
simplifying  the  image  data  via  some  preprocessing  technique. 

In  retrospect,  if  we  are  to  design  a vision  system  which  will  guide 
and  enhance  the  mobility  of  the  blind  (or  some  mobile  vehicle  for  that 
matter),  it  is  important  that  we  make  note  of  the  following  points: 

(1)  If  constraints  are  made  in  the  design  process,  or  if  an 
ideal  setting  is  used  for  experimentation,  the  end 
results  should  overcome  these  constraints,  and  the 
vision  system  should  be  able  to  extend  to  other  settings 
which  are  not  necessarily  ideal. 

(2)  If  the  limits  of  the  processing  time  requires 

simplification  of  the  visual  data,  then  this 

simplification  must  be  consistent  with  the  original 
purpose  for  which  the  vision  system  was  designed. 


CHAPTER  III 

COMPUTER  VISION  APPROACH 

This  chapter  is  organized  into  four  sections.  In  the  first 
section,  a computer  vision  conceived  for  the  blind  is  introduced.  The 
functional  and  design  aspects  of  the  main  components  of  this  vision 
system  are  described.  In  the  second  section,  a tree-like  structure  of 
the  scene  analysis  is  given.  This  type  of  structure  allows  for  the 
modular  development  of  software  algorithms  which,  when  the  hardware 
permits,  can  be  reorganized  to  accomodate  parallel  processing.  The  flow 
of  events  taking  place  in  this  structure  is  analyzed.  In  the  third 
section,  the  organization  of  the  various  software  algorithms,  yielding 
the  integrated  system,  is  presented.  A decision  making  process 

controlling  this  integrated  system  is  described.  Finally,  serving  as  a 
transitional  step,  the  last  section  derives  the  geometrical  formulae 
relating  the  2-D  image  plane,  which  constitutes  the  domain  of  analysis 
for  subsequent  chapters,  to  the  3-D  real  world,  which  represent  the 
user's  domain,  in  this  case,  the  domain  of  the  blind. 

3. 1 The  Vision  System  Design 

A computer  vision  system  design  for  the  blind,  originally 
introduced  in  study  [24],  is  illustrated  in  Figure  3.1.  Two  versions  of 
the  design  with  distinctive  audio  units  are  given.  The  main  components 
of  this  system  are  (1)  a camera  unit  serving  as  a sensor  of  the  real 
world;  (2)  a microcomputer  system  to  interpret  what  the  camera  sees;  and 
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Figure  3.1  Computer  Vision  System  for  the  Blind 
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(3)  an  audio  unit  which  decodes  the  interpretation  results  for  the 
blind.  In  addition  to  these  main  components,  there  is  a knowledge  base 

to  assist  in  the  interpretation  of  those  vision  problems  which  cannot 
easily  be  characteri zed  in  symbolic  terms,  and  a tactile  unit  for  the 
blind-deaf.  In  this  tactile  unit,  only  the  safety  path  is  displayed  in 
order  to  eliminate  the  tactile  vision  problems  discussed  earlier,  and 
the  range  information  is  conveyed  to  the  blind-deaf  in  the  form  of  taps, 
where  each  tap  could  represent  a walking  step. 

3.1.1  The  Camera  Unit 

The  camera  used  in  this  system  can  have  either  a familiar, 
conventional  camera  tube  which  employs  electron-beam  scanning,  or  it  can 
have  the  recently  introduced  solid-state  imaging  sensors  which  use 
charge  coupled  devices  [25].  Presently,  there  are  two  types  of  cameras 
commercially  available  which  use  solid-state  imaging  sensors.  They  are 
the  charge  coupled  device  (CCD)  camera,  and  the  charge  injected  device 
(CIO)  camera.  Both  CCD  and  CID  cameras  are  basically  metal-oxyde- 
si1 icon  (MOS)  structures,  where  CCD  capacitors  are  used  to  constitute 
the  photosensitive  area.  The  amount  of  light  entering  the  lens  and 

falling  upon  the  photosensitive  area  is  converted  proportional ly  into  an 
array  of  electrical  charge  packets  collected  by  the  CCD  capacitors.  In 
the  CCD  structure  case,  by  applying  a voltage  to  the  silicon  substrate, 
the  charges  are  transferred  to  a storage  area  either  in  a line  transfer 
form  (one  line  at  a time)  or  in  a frame  transfer  form.  In  the  CID 
structure  case,  the  charges  remain  confined  to  the  photosensitive  area, 
and  are  accessed  via  an  X-Y  addressing  scheme. 
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Solid  state  cameras  have  not,  as  of  yet,  reached  the  resolution  of 
conventional  cameras,  but  they  offer  various  advantages  such  as 
ruggedness,  enhanced  sensitivity  in  low  light  level  environments,  small 
size,  low  power  consumption,  and  blooming  control  capability.  For 
difficult  applications  such  as  the  one  addressed  here,  it  is  desirable 
to  have  a camera  with  the  following  additional  features:  (1)  automatic 

focusing;  (2)  automated  aperture  for  regulating  the  amount  of  light 
entering  the  lens;  this  is  an  important  feature  for  cases  where  a 
sequence  of  frames  is  required  to  carry  an  image  analysis  task;  and  (3) 
fast  shutter  speed  to  alleviate  any  blurring  due  to  camera  motion  [26]. 

To  interface  this  camera  with  the  microcomputer  an  analog  to 

digital  (A/D)  converter  is  required.  This  A/D  convertor  converts  an 

analog  video  signal  into  its  digital  representation.  The  most  desirable 
A/D  convertor  to  have  in  this  case  is  one  that  would  digitize  a frame  in 
real  time,  in  other  words,  at  video  rate  or  1/30  second.  A camera 

control  unit  is  used  for  user's  control  over  the  camera  and  for  the 
image-taking  process. 

3.1.2  The  Microcomputer  System 

The  general  configuration  of  the  microcomputer  system  is 

illustrated  in  Figure  3.2.  This  configuration  disregards  parallelism  of 
operation  and  is  given  to  assess  the  basic  requirements  for  supporting 
the  image  processing  applications  presented  here.  It  will  become  clear, 
as  we  progress  in  this  dissertation,  that  the  ultimate  design  would  have 
a multiple-microprocessor  configuration,  where  each  microprocessor  is 
dedicated  to  a specific  image  task  [27-31]. 


Microprocessing 

Unit 
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Figure  3.2  Configuration  of  the  Microcomputer  System 
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It!£ — Microprocessor.  For  generally  complex  image  processing 
applications,  it  is  important  to  choose  the  16-bit  microprocessors  for 
they  support  enhanced  memory  management,  high  level -languages , powerful 
instruction  sets,  and  a sophisticated  interrupt  capability  [32]. 
Furthermore,  multiprocessor  interfacing  which  could  lead  to  real-time 
processing  of  complex  applications  is  readily  available  on  16-bit 

microprocessors. 

The  Memory.  The  effective  manipulation  of  image  data  and  the 
relatively  high  storage  requirement  for  both  image  data  and  software 
algorithms,  requires  high  density  and  short  memory  access  times. 
Current  technology  has  made  it  possible  for  read  only  memory  (ROM)  and 
random  access  memory  (RAM)  chips  to  grow  to  the  current  256k  bits/chip 

and  beyond,  with  an  access  time  as  low  as  5 nanoseconds.  For  image 

processing  applications  both  ROM  and  RAM  memory  chips  are  required.  The 
ROMs,  due  to  their  nonvolatile  characteristic,  store  programs.  The  RAMs 
store  image  data  which,  unlike  the  program  code,  changes  frequently.  In 
image  data  processing,  the  memory  density  required  for  an  image 

comprising  NxM  picture  elements  (pixels)  with  a 2m  gray  level  resolution 
is  NxMxm  bits.  Generally,  m = 6 is  a sufficient  resolution  for  adequate 
image  analysis.  The  values  for  N and  M are  a function  of  the  size  of 
film  used  and  the  resolution  (p)  of  the  digitizing  device.  For  example, 
a 35  mm  film  (35x24  mm),  using  a resolution  p of  lOOum  or  10_4m  would 
result  in  a digitized  image  of  350x240  pixels.  The  memory  requirements 
for  program  storage  is  a function  of  the  tasks  that  are  implemented  to 
perform  the  necessary  image  analysis. 
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Ih-e  Direct  Memory  Access  (DMA)  Controller.  The  DMA  controller  is 
extremely  useful  in  applications  where  the  required  data  transfer  rate 

is  higher  than  that  supported  by  the  processor.  With  the  use  of  a DMA, 

data  transfer  can  take  place  either  while  the  microprocessor  is 

performing  non-memory  functions,  or  on  a cycle-steal  type  of  transfer 
i.e.,  processor  and  DMA  alternating  in  operation  on  a memory  cycle 
basis.  This  DMA  feature  saves  processing  time.  An  elaborate  DMA 
protocol  description  is  given  in  [33]. 

l!l® — Input/Output  (I/O  Ports).  The  I/O  ports  are  basically  small 
external  registers  in  communication  with  the  real  world. 

3.1.3  The  Audio  Unit 

The  audio  unit  is  designed  to  convert  scene  descriptions  into 
auditory  information.  In  version  I of  Figure  3.1,  the  concept  of  speech 
generation  from  a set  of  digitized  words  is  used.  This  concept  is 
treated  at  great  length  in  the  field  of  speech  analysis  [34],  and  it  has 
been  put  to  practical  uses  as  in  the  case  of  the  talking  calculator  for 
the  blind  [35].  In  version  II,  the  concept  of  automatic  indexing  of 

tape  recorders  is  used  [36].  Figure  3.3  illustrates  the  audio 

conversion  process  using  the  concept  of  version  II.  The  automatic 
indexing  is  simply  a sophisticated  control  process  over  the  familiar 
tape  counter  found  in  the  standard  tape  recording  machines.  As 
illustrated,  the  microcomputer  results  (image  interpretation  results) 
are  given  an  index  by  the  indexing  interface.  The  comparator  contrasts 
this  index  to  the  actual  tape  position  and  feeds  the  information  to  the 
transport  logic  control  which  initiates  the  necessary  reverse,  fast- 
forward,  stop,  or  play  functions  to  retrieve  the  desired  auditory 
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Figure  3.3  Visual/Audio  Conversion  System 
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information.  The  auditory  information,  in  this  case,  is  separated  into 
fixed  format  sentences  such  as  "path  is  clear,"  "you  may  turn 
left/right,  and  into  variables  which  will  provide  the  range 

information.  The  results  will  be  "path  is  clear  for  X steps,"  "you  may 
turn  left/right  after  Y steps." 

The  knowledge  base  in  the  proposed  system  is  integral  to  the 
software  algorithms,  and  is  implemented  in  the  form  of  if-then-else 
rules.  In  conjunction  with  some  of  the  image  techniques  we  suggest,  for 
enhanced  performance,  the  use  of  two  cameras  instead  of  the  use  of 
sequential  frames  from  a single  camera.  In  this  case,  a system  as  shown 
in  Figure  3.4  is  envisaged. 

3.2  Domain  of  Analysis  and  Organizing  Principles 

Before  one  can  develop  software  modules  for  the  interpretation  of 
scenes,  one  must  first  establish  a hierarchical  structure  of  events 
which  are  integral  to  the  domain  of  analysis  of  the  proposed  vision 
system.  In  this  structure,  as  illustrated  in  Figure  3.5,  the  vision 
system  first  determines  whether  the  path  in  front  of  the  blind  is  clear 
or  not.  If  it  is  clear,  the  system  will  trace  a safety  path  for  the 
blind  to  follow.  In  conjunction  with  the  safety  path  tracing,  the  range 
of  the  obstacle-free  path  is  estimated.  If  the  path  is  not  clear,  then 
the  system  will  determine  the  nature  of  the  object  (obstacle).  We 

should  indicate,  however,  that  for  a localized  object,  that  is,  an 

object  which  extends  over  only  a small  area,  the  blind  may  choose  to 
simply  avoid  this  object  rather  than  have  the  vision  system  identify 
it.  In  real  world  scenes,  an  object  in  the  path  of  travel  may  be  either 

a cast  shadow  or  a real  object.  The  real  object,  in  turn,  can  be 
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Figure  3.4  Configuration  of  the  Microcomputer  System  Using  Two  Cameras 


Scene  Analysis  for  the  Blind 
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relatively  flat,  upright,  or  a depression.  With  this  type  of 
classification,  we  can  basically  categorize  all  possible  types  of 
objects  which  can  be  encountered  by  the  blind  into  objects  which  can 
cause  physical  harm  (upright  objects  and  depressions),  objects  which  are 

simply  a false  alarm  (shadows),  and  objects  which  are  relatively  safe 
(flat  objects). 

The  methodology  for  guiding  the  blind  through  a given  scene  is  as 
fol  1 ows : 

(1)  As  an  initial  step,  the  vision  system  takes  left,  front 

and  right  images  of  the  scene  in  front  of  the  blind  to 
acquire  a wide-angle  view.  Each  image  is  analyzed  using 
the  first-pass  evaluation.  The  results  derived  from  the 
three  images  are  integrated  to  yield  an  optimal  tracing 
of  the  safety  path.  We  call  this  step  the 

initialization  phase. 

(2)  The  blind  decides  on  the  most  reasonable  path  to  follow, 

proceeds,  and  the  vision  system  is  directed  to  enter 
what  we  call  the  walking  phase.  In  this  walking  phase, 
the  vision  system  processes  images  in  the  chosen 
direction  of  travel.  The  wide-angle  view  is  no  longer 
necessary,  unless  a major  obstruction  is  encountered  and 
a new  direction  of  travel  is  chosen.  The  image  taking 

process  is  a function  of  the  range  of  the  safety  path. 

For  example,  if  a path  is  obstacle-free  for  x steps, 
then  an  image  may  be  taken  every  x^  steps,  where  x^  is 
the  integer  part  of  the  fraction  x/k  and  k = 2,  3,  4, 

...  depending  on  how  large  x is.  In  this  phase, 
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essential  safety  path  cues  such  as  safe  step,  obstacle 
ahead,  turn  left/right,  are  provided  in  real-time,  and 
the  timing  of  the  vision  system  is  such  that  it  is 
always  processing  a few  steps  ahead  of  the  blind. 

(3)  If  an  object  is  found  along  the  direction  of  travel,  the 
system  issues  a warning  signal  to  the  blind  and  asks 
him/her  to  pause  and  take  another  picture.  The  system 
then  determines  the  range  and  extent  of  the  object,  and 
provides  the  necessary  avoidance  cues.  If 

identification  of  the  object  is  desired,  the  system 
enters  the  identification  process.  We  call  this  step 

the  warning/identification  phase. 

In  the  next  section,  we  describe  the  integrated  vision  system  which 
supports  the  above  methodology. 

3.3  The  Integrated  Vision  System 

The  functional  structure  which  links  the  various  image  techniques 
to  yield  the  integrated  vision  system  is  illustrated  in  Figure  3.6. 
Given  the  complex  nature  of  the  problem,  one  is  faced  with  various 
considerations  pertaining  to  the  physical  nature  of  real-world  scenes 
and  their  projection  onto  the  2-D  image  plane  during  the  design  of  such 
a structure.  Our  approach  to  the  problem  involved  (1)  ascertaining  the 
essential  physical  characteristics  of  scenes;  (2)  finding  the  features 
which  characterize  the  elements  (objects)  generally  found  in  these 
scenes;  (3)  devising  the  appropriate  image  techniques  which,  through 
some  symbolic  representation,  will  extract  and  interpret  these  features 
and  (4)  determining  a logical  order  in  which  these  image  techniques  will 
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Figure  3.6  Integrated  Vision  System 
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be  performed.  In  accordance  with  these  four  steps,  we  describe  below 

the  main  functions  and  the  decision  making  process  of  this  integrated 
vision  system. 

The  first  function  of  the  integrated  vision  system  is  to  provide 
the  blind  with  a safety  path.  To  carry  out  this  function,  the  system 
uses  the  first-pass  evaluation  technique  which,  by  extending  the 
physical  characteristics  of  scenes,  is  devised  to  expoit  the  surface 

consistency  constraint.  The  surface  consistency  constraint  implies  that 
a physical  surface  with  a given  orientation  is  continuous  due  to  the 
coherence  of  matter  and,  consequently,  will  exhibit  uniform 

reflectance.  This  fact,  in  the  2-D  image,  translates  into  a surface 
with  consistent  gray  level  intensities.  Of  course,  one  is  to  expect 
some  surface  irregularities  and  image  noise,  but  these  problems  are 
tolerated  simply  by  setting  an  empirical  gray  level  threshold  to  account 
for  the  ensuing  deviations.  The  surface  consistency  constraint  has  been 
put  to  practical  use  in  the  recovery  of  shape  and  other  features  of 

objects  [37,38],  and  in  the  matching  process  of  some  of  the  stereo 

vision  applications  [39,40],  In  this  system,  the  constraint  is  used  by 
the  first-pass  evaluation  technique  to  plan  a safety  path  for  the  blind 
by  comparing  the  environment  that  is  ahead  with  an  initial  environment 
which  is  already  determined  to  be  obstacle-free. 

The  second  function  of  the  integrated  vision  system  is  to  provide 
the  blind  with  the  needed  additional  information  in  a suitable  and 
organized  way  in  order  to  enhance  the  interpretation  of  the  viewed 

environment.  This  additional  information  is  acquired  in  two  distinct 
modes  of  image  analysis,  as  described  in  the  introduction,  and  is 
accessible  either  directly,  via  the  blind's  command,  or  automatically 


28 


through  the  first-pass  evaluation.  The  direct  access  mode  is  necessary 
in  the  case  where  the  blind  person  desires  a primitive  description  of  a 
detected  object,  or  in  extracting  upright  landmarks  to  help  the  blind 
locate  such  things  as  a bus  stop,  the  corner  of  a building,  or  a 
doorway.  The  automatic  access  mode  is  carried  out  in  the  event  where  an 
object  blocking  the  path  of  travel  is  detected  by  the  first-pass 
evaluation.  A key  issue  of  the  automatic  access  mode  is  to  have  the 
information  processing  tasks  organized  in  a logical  and  efficient  way. 
To  do  so,  we  have  devised  a decision  making  process  based  on  the 
principle  that  the  vision  system  is  to  verify  the  identity  of  an  object 
only  if  some  primary  cues  suggest  its  existence. 

Remember  that  we  have  categorized  all  the  possible  objects  which 
the  blind  may  encounter  into  (1)  shadow,  (2)  depressions,  (3)  upright 
objects,  and  (4)  flat  objects.  First,  let  us  define,  briefly,  the 
essential  features  characterizing  the  above  categories  of  objects. 

(1)  Shadow.  A surface  upon  which  a shadow  is  cast  will  preserve 

its  intrinsic  physical  characteristics.  This  is  due  to  the  relatively 
uniform  effect  of  shadow  on  the  image  gray  level  intensities. 

(2)  Depressi ons . In  approaching  a depression,  one  is  bound  to  see 
new  information  that  was  previously  occluded.  We  call  this  the  occluded 
information. 

(3)  Upright  objects.  Most  man-made  upright  objects  have  straight 

vertical  edges.  The  few  other  man-made  objects  together  with  the 

natural  objects  which  do  not  have  straight  vertical  edges  can  be 
distinguished  by  the  manner  in  which  they  project  onto  the  2-D  image 
plane.  An  upright  object  projects  in  the  2-D  image  plane  proportionally 
to  the  depth  of  field  it  occludes. 
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(4)  Flat  objects.  Flat  objects  are  affected  by  perspective.  Also, 
a flat  object  projects  in  the  2-D  image  plane  proportional ly  to  its 
length. 

Now,  assume  that  an  object  is  detected  by  the  first-pass  evaluation 
and  that  the  nature  of  this  object  needs  to  be  determined.  To  perform 
this  task,  the  integrated  vision  system  initiates  a decision  making 
process  which  consists  of  three  steps  of  analysis.  These  three  steps 
are  the  following: 

(1)  Determine  if  the  primary  cues  which  allow  for  shadow 
identification  exist.  This  is  done  by  checking  the 
following  conditions: 

(a)  Determine  if  the  area  of  the  detected  object  is 
within  reasonable  range.  This  is  to  ensure  that 
the  intrinsic  characteristics  of  the  object  are 
visible.  The  range  in  consideration  will  depend  on 
the  resolution  of  the  camera  used. 

(b)  Check  if  the  gray  level  intensities  of  the  area 
under  analysis  are  within  the  gray  level  range  of 
the  shadow.  This  condition  is  to  account  for  the 
fact  that  a shadow  has  a minimal  effect  on  the  gray 
level  intensitites,  and  if  it  is  too  dark  then  the 
intrinsic  characteri sties  of  the  area  will  be 
concealed,  and  shadow  identification  is  no  longer 
feasible. 

(c)  Determine  if  the  effect  of  the  object  in  question 
on  the  gray  level  intensities  is  uniform. 
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If  all  three  conditions  (a),  (b)  and  (c)  are  satisfied, 
the  shadow  identification  process  is  initiated.  Else, 
step  2 below  is  performed. 

(2)  Determine  if  primary  cues  for  a depression  exist.  In 
this  step,  using  a sequence  of  frames,  the  system 
determines  if  new  information  (occluded  information)  is 
revealed  by  comparing  an  initial  frame  to  another  frame 
taken  at  a closer  range  from  the  detected  object.  If 
new  information  is  revealed,  the  system  initiates  the 
depression  identification  process.  Else,  step  3 below 
is  performed. 

(3)  Distinguish  upright  objects  from  flat  ones.  This  step 

is  performed  last  because,  unlike  steps  1 and  2 which 
lend  themselves  to  real-time  processing,  there  are  in 
this  case  no  primary  cues  which  can  be  extracted  in  real 
time  with  the  available  hardware.  Furthermore,  unlike 
steps  1 and  2 which  are  contained  in  mode  1 of  analysis, 
this  step  is  contained  in  mode  2 of  analysis  which 
requires  the  additional  processing  step  of  either  edge 
detection  or  binary  image  generation.  This  step  is 

performed  as  follows: 

(a)  Using  the  edge  detection  technique,  check  for  the 
presence  of  straight  vertical  edges.  If  straight 
vertical  edges  exist,  inform  the  blind  of  their 
location.  Else,  perform  step  (b)  below. 

(b)  Using  the  binary  image,  evaluate  the  2-D  projection 

properties  which  distinguish  an  upright  object  from 
a flat  one. 
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We  should  note  that  when  all  three  steps  of  the  decision  making 
process  fail  to  produce  conclusive  results,  for  safety  purposes,  the 
object  is  declared  an  obstacle  regardless  of  its  real  nature.  To 

complete  identification  in  this  case,  a close  range  image  of  the  object 
is  necessary.  Let  us  also  indicate  that  the  broken  arrows  in  Figure  3.6 
are  an  indication  of  opportunities  for  concurrent  processing. 

3.4  Mapping  Principles  Between  the  2-D  Image  and  the  3-D  Real  World 

In  the  chapters  that  follow,  analysis  is  performed  in  the  2-D 

digitized,  but  the  audio  results,  which  are  conveyed  to  the  blind,  must 
conform  to  the  3-D  real-world  environment.  This  section  derives  the 

formulae  to  allow  for  such  a transformation.  These  formulae  can  also  be 

used  to  focus  the  image  processing  on  specific  areas  only. 

Digitized  image-to-actual  image  mapping.  The  unit  of  a digitized 
image  is  a picture  element  or  pixel  while  the  unit  of  the  actual  image 
is  a millimeter.  Using  the  resolution  factor  (p)  of  the  digitizing 
device  used,  an  NxN  digitized  image  is  transformed  into  an  NpxNp  actual 
image.  In  our  digitizing  device  the  resolution  p is  0.1  mn/pixel. 

Actual  image-to-real  world  mapping.  In  this  mapping,  we  assume  a 

relatively  flat  ground  plane,  and  that  the  camera  lens  is  at  the 

infinity  setting,  i.e.,  the  distance  from  the  film  to  the  lens  is  equal 

to  the  focal  length  (f)  of  the  lens.  Deviations  in  the  case  where  the 

ground  plane  is  not  flat  are  assessed  later.  Given  Figure  3.7,  using 

properties  of  similar  triangles,  an  image  area  delimited  by  x,-  x-.  v • 

J I * J 9 J 1 

and  yj  in  the  (x,y)  image  coordinate  system  is  mapped  in  width  (W)  and 
range  (R)  in  the  real-world  environment  by  the  following  relationships: 
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R(yi,yJ.)  = R(yj)  - R(y.) 


where 


R(y)  = 


(f  + y tan  a)  h tan  (B  + a)  - fh  tan  a 
f + ( f tan  a - y)  tan  ( B + a) 


/ ^ f + »y  i ) 

w(vxj)  = — : — — - *i> 


(3.1) 

(3.2) 

(3.3) 


where  h is  the  camera  height.  The  term  R(y0,yi)  in  equation  (3.3)  is 
determined  using  relation  (3.1)  and  equation  (3.2)  with  yQ  = 0.  Note, 
for  example,  if  the  image  plane  is  untilted,  i.e.  a = 0,  then  by 
combining  relation  (3.1)  and  equation  (3.2),  we  obtain 


R(y-i  »yj ) 


fhL 


2 


(th  - yjL)(fh  - y.L) 


(3.4) 


where  L is  the  range  between  the  camera  and  the  first  point  viewed  by 
the  camera. 

With  respect  to  the  idea  of  having  the  image  analysis  focused  on  a 
specific  region  of  the  environment,  let  us  treat  an  example  where  it  is 
desired  to  process  the  environment  that  is  within  a range  R(yk)  from  the 
blind.  R(yk)  is  simply  L + R(y0,yk).  This  problem  reduces  to  finding 
the  image  coordinate  yk  corresponding  to  the  range  R(yk).  From  equation 
3.2,  yk  is  derived  as 


f[R(yk)  + h tan  ct]  + f[R(yk)  tan  a - h]  tan  (B  + a) 

y k 1 ■ — 

[R(yk)  + h tan  a]  tan  (B  + a) 

Again,  if  we  let  a = 0 , equation  (3.5)  takes  the  simple  form 


yk 


fh[R(yk)  - L] 

lr<*k> 


(3.6) 
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Figure  3.8  illustrates  an  example  of  how  yk  varies  as  a function  of 
parameter  L.  The  other  parameters  are  in  this  case  f = 35  mm,  h = 1.5 

m,  and  R(yk)  = 10m.  Similar  analysis  can  be  made  for  various  other 
camera  viewing  conditions. 

We  should  indicate  that  for  simplicity  in  the  derivation  of  the 
above  formulae,  we  have  assumed  a fixed  camera  system,  infinity  setting 
of  the  camera  lens,  and  normal  range  computations.  A detailed  study 
which  accounts  for  various  other  physical  and  mathematical  properties  of 
cameras  and  camera  lenses  is  given  in  [41]. 

In  the  next  two  chapters,  we  detail  the  image  techniques  of  the 
integrated  vision  system. 
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CHAPTER  IV 

SCENE  ANALYSIS  FOR  SAFETY  PATH  PLANNING 

In  this  chapter  the  three  image  techniques  which  are  used  in 
conjunction  with  the  safety  path  planning  are  described  in  detail.  The 
first  section  describes  the  wide-angle  view  technique.  Both  the 
configuration  of  this  technique  and  the  procedure  used  to  obtain  an 
optimal  safety  path  from  the  three  processed  images  of  the  wide-angle 
view  are  described.  The  second  section  describes  the  first-pass 
evaluation  technique.  The  first-pass  evaluation  technique  is  applied  to 
each  image  to  generate  both  the  safety  path  and  a global  interpretation 
of  the  given  image.  The  perspective  effect  and  the  range  finding  scheme 
supplementing  this  technique  are  presented.  The  first-pass  evaluation 
technique,  which  is  contained  in  mode  1 of  scene  analysis,  utilizes  the 
second-pass  evaluation  technique  for  assistance  in  providing  primitive 
descriptions  of  the  objects  detected.  This  last  technique  which 

processes  the  binary  image  and  is  contained  in  mode  2 of  scene  analysis 
is  described  in  the  third  section. 

4. 1 Wide-Angle  View  Technique 

In  humans,  the  total  angle  of  view  is  approximately  220°,  and  this 
includes  the  peripheral  vision.  Of  course,  the  peripheral  vision  is  not 
as  distinct  as  the  frontal  vision,  but  without  it,  it  is  quite  certain 
that  our  sense  of  direction  would  be  quite  impaired,  especially  in 
unknown  surroundings.  Without  peripheral  vision  most  of  us  would  be 
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forced  to  look  to  the  left  then  to  the  right  before  proceeding  to  some 
destination.  It  is  based  on  this  idea  of  the  importance  of  peripheral 
vision  that  we  have  devised  the  wide-angle  view  technique,  which  Tou 
proposed  originally  in  study  [42]. 

Two  practical  steps  constitute  the  wide-angle  view  technique.  One 
is  the  acquisition,  by  the  vision  system,  of  a wide-angle  view  by  taking 
left,  front,  and  right  images  of  the  environment  in  the  vicinity  of  the 

blind.  The  second  and  most  important  step  is  the  integration  of  the 

safety  path  results  obtained  from  the  images  which  are  processed 
independently  by  the  first-pass  evaluation.  This  integration  process 
yields  safety  path  results  which  allow  the  blind,  when  possible,  to 

cross  from  one  environment  to  another  (front  to  left  or  front  to  right) 
in  an  optimal  way. 

4.1.1  Configuration  of  the  Wide-Angle  View 

A configuration  of  the  wide-angle  view  is  illustrated  in  Figure 
4.1(a).  This  configuration  is  characterized  by  two  angles  of  view  of 

the  camera.  These  are  the  horizontal  angle  of  view  0,  and  the  vertical 
angle  of  view  «>.  From  simple  triangulation,  these  two  angles  are 
determined  using  the  relation  below: 

V(q)  = 2 arctan  (q/2f) 

where  q depends  on  the  type  of  film  used.  For  example,  if  we  use  a 35 
mm  lens  ( f = 35  mm)  and  a 35  mm  film  whose  single  frame  dimension  is 
35x24  mm,  then,  using  equation  4.1,  the  horizontal  angle  of  view,  9,  is 
derived  by  substituting  the  frame  length  for  q,  as  9 = V(35)  - 53° 
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(a)  Configuration  of  the  Wide-Angle  View 


Left  View 


Right  View 


(b)  Parameters  of  the  Wide-Angle  View 


Figure  4.1  The  Wide-Angle  Technique 
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and  the  vertical  angle  of  view,  <j>,  is  derived  by  substituting  the  frame 
width  for  q,  as  <j>  = V( 24)  - 35°  . To  appreciate  the  purpose  of  the 

wide-angle  view,  Figure  4.2  contrasts  the  example  of  a view  obtained 
from  one  image  with  that  of  the  wide-angle  view  which  combines  left, 
front,  and  right  images. 

Ideally,  the  result  of  the  wide-angle  view  must  look  like  the 
example  in  Figure  4.2(b).  However,  unless  a rigid,  well  controlled 
camera  system  similar  to  Moravec's  [19]  is  devised,  one  must  expect  some 
overlap  between  the  three  images  as  in  the  case  shown  in  Figure  4.3. 
Small  overlaps  are  tolerated  by  the  wide-angle  view  technique  since  its 
purpose  is  to  optimize  the  tracing  of  the  safety  paths  rather  than  the 
reconstruction  of  the  wide-angle  view  itself. 

As  a note  of  caution,  we  should  indicate  that  in  some  of  the  wide- 
angle  view  examples,  it  is  observed  that  a given  surface  may  appear 
darker  or  lighter  in  either  the  left  or  the  right  view  with  respect  to 
the  front  view.  This  is  due  to  the  different  amount  of  light  entering 
the  lens  in  each  direction.  To  correct  this  problem,  it  is  advised  to 
use  a camera  with  automated  aperture  that  regulates  the  amount  of  light 
entering  the  lens. 

4.1.2  The  Integration  Process  of  the  Wide-Angle  View 

Each  image  of  the  wide-angle  view  is  processed  using  the  first-pass 
evaluation  technique  described  in  the  next  section.  To  remain  with  the 
description  of  the  wide-angle  view  technique,  let  us  assume  that  the 
first-pass  evaluation  has  been  performed  on  each  image  and  that  all  the 
necessary  parameters  of  the  wide-angle  view  have  been  determined.  With 
reference  to  Figure  4.1(b),  parameters  PI,  Pf,  and  Pr  correspond  to  the 
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Figure  4.3  Image  Overlap  in  the  Wide-Angle  View 
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safety  paths  for  left,  front,  and  right  images.  Parameters  C.-  • denote 

J 

the  various  clearances  where  the  first  index,  i,  identifies  the  image 
and  has  values  of  1,  f,  and  r for  left,  front,  and  right  images, 
respectively.  The  second  index,  j,  identifies  the  direction  and  has 
values  of  1 and  r for  left  and  right  direction,  respectively.  To  each 
of  these  parameters,  we  associate  a range  value  estimated  using  the 
range  finding  scheme  presented  in  section  4.2.2.  Given  these  parameters 
and  referring  back  to  Figure  4.1(b),  the  integration  process  which 
yields  an  optimal  safety  path  is  performed  using  the  following  step-wise 
procedure: 

(1)  Determine  the  left  and  right  optimal  clearances  C-j  and 
Cr  of  the  wide-angle  view  given  by 

C.j  - min  (C^i  , C^)  and  = min  (C^,  ) 

(2)  If  clearance  C-|  and  the  path  of  the  left  image  P-|  exist, 
determine  both  the  angle  and  path  of  the  wide-angle  view 
in  the  left  direction 

(a)  The  angle  of  the  wide-angle  view  in  the  left 
direction  is  given  by 

A1  = arctan  [min(C]  , P]  )/Lw] 

where  Lw  is  a function  of  the  horizontal  angle  of 

view  0.  For  example  if  8 - 53°  as  derived 

earlier,  then  - L.  L is,  as  defined  earlier, 

the  range  between  the  camera  and  the  nearest  point 
viewed  by  the  camera. 
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(b)  The  path  of  the  wide-angle  view  in  the  left 

direction  is  given  by 

Px1  = LW/C0S(V 

(3)  Perform  a similar  analysis  (as  in  step  2)  for  the  wide- 
angle  view  in  the  right  direction. 

(a)  The  angle  of  the  wide-angle  view  in  the  right 

direction  is  given  by 

V = arctan  [min(Cr,Pr)/Lw] 

(b)  The  path  of  the  wide  angle  view  in  the  right 
direction  is  given  by 

pv  = Vcos(Ar) 

(4)  In  the  case  where  the  wide-angle  view  in  the  left 

direction  is  chosen,  determine  the  remaining  portion  of 

path  P-|  denoted  by  P-i  which  extends  beyond  path  P 

1 

and  is  given  by 


Pli  = pi  - Lwtan(X1) 

(5)  Similarly,  in  the  case  where  the  wide-angle  view  in  the 

right  direction  is  chosen,  determine  the  remaining 

portion  of  path  Pp  denoted  by  Pp^  which  extends  beyond 

path  P^  , which  is  given  by 
r 
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Pri  ' Pr  - Lwtan(ir) 

(6)  The  additional  information  of  veers  or  turns  which  may 
take  place  after  either  path  P-,^  or  P can  be 

determined  using  the  same  analysis  as  that  performed  in 
steps  2(a)  and  2(b)  or  3(a)  and  3(b),  respectively. 

Results  of  a computer  implementation  of  this  procedure  applied  to 
outdoor  scenes  are  illustrated  in  Figure  4.4. 

4.2  First-Pass  Evaluation  Technique 

A basic  assumption  made  in  the  first-pass  evaluation  is  that  the 

immediate  surroundings  of  the  initial  position  of  the  blind,  i.e.  an 

area  which  can  be  ci rcumscribed  by  the  reach  of  a cane,  is  a safe  or 

obstacle-free  area.  This  assumption  together  with  the  surface 

consistency  constraint,  constitute  the  core  of  the  first-pass  evaluation 

technique.  The  first  step  in  the  implementation  of  this  technique 

consists  of  partitioning  the  image  to  be  processed  by  a virtual  grid 

whose  unit  is  a cell  containing  10x10  pixels,  as  seen  in  Figure  4.5. 

This  partitioning  scheme  is  used  to  facilitate  image  processing. 

For  the  initial  obstacle-free  area,  the  first-pass  evaluation 

computes  an  optimal  regional  average  gray  level  (Gr)  for  use  as  a 

reference.  This  Gr  is  found  through  an  iterative  process  which  starts 

from  an  initial  regional  average  gray  level  Gr  , where  each  G_  , in  the 

1 ~i 

iterative  process,  is  computed  using  the  following  equation. 

1 L2  m 

G = — [ T Z G(  k , 1 ) ] 

1 N.  k=L  1=1 


(4.2) 
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Figure  4.4  Computer  Results  of  the 
Wide-Angle  View 
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Figure  4.4— continued 
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(a)  Image  to  be  Processed 
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(b)  The  Initially  Safe  Area 


Figure  4.5  First-Pass  Evaluation  Technique 
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where  G(k,l)  represents  the  average  gray  level  value  of  a unit  cell  of 
the  grid,  and  k and  1 denote  the  column  and  row  level  of  the  grid, 
respectively,  is  the  number  of  cells  considered  from  the  safe  area, 
and  Nj  being  the  number  of  cells  contained  in  the  initial  safe  area,  m 
is  a function  of  the  size  of  the  safe  area  considered  and  L1  and  l_2 
constitute  the  limits  of  a path  wide  enough  for  the  blind  to  walk 
through  freely.  Note  that  in  the  image  to  be  processed,  and  L2  vary 
as  a function  of  range  to  account  for  the  perspective  effect  which  is 

assessed  in  section  4.2.1.  Once  Gr  is  evaluated,  all  the  cells  in  the 
safe  area  are  checked  to  see  if  the  corresponding  values  of  G(k,l) 

satisfy  the  condition  below  (with  i =1). 

Gr_  - T < G(k,l  ) < Gp . + T (4.3) 

T is  an  empirical  gray  level  threshold  which  accounts  for  those  cells 

which  have  large  gray  level  deviations  in  contrast  to  the  value  G (i  = 

ri 

1).  These  deviations  are  generally  caused  by  debris  or  dirt  spots.  In 

terms  of  standard  deviation  (a)  of  the  gray  levels,  this  threshold  can 

be  appropriately  set  to  T = 0.5a.  The  cells  which  do  not  satisfy 

condition  (4.3)  are  dropped  in  the  next  evaluation  of  the  regional 

average  gray  level  G The  process  is  repeated  for  G_  with  No  = Ni  - 

c r 2 l 

(number  of  dropped  cells),  using  equation  (4.2)  with  i set  to  2.  This 
iterative  process  continues  for  increasing  i until  all  the  cells  that 
are  considered  satisfy  condition  (4.3).  The  Gr.  satisfying  this 
condition  is  considered  the  optimal  regional  average  gray  level  which  we 
denote  by  Gr.  This  iterative  procedure  ensures  that  the  area  considered 
from  the  initially  safe  area  has  a consistent  surface. 
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Referring  back  to  Figure  4.5,  with  Gp  evaluated,  given  an  NxN 
image,  the  first-pass  evaluation  technique  performs  the  following 
analysi s : 

(1)  Walking  straight  ahead.  This  step  is  performed  to  plan 
a safety  path  for  the  blind  in  the  forward  direction. 

In  this  step,  the  system  checks  the  following  safety 
condition: 

Gr  - T < G( k ,1 ) < . Gp  + T 

where  1 - 1,  2,  ...,  1*  and  k = L^,  ...,  1*  is  the 

row  level  of  the  grid  where  the  incident  ray  of  the 
camera  is  actually  parallel  to  the  ground  plane,  i.e. 
the  camera  is  no  longer  viewing  the  ground  plane  but  is 
viewing  the  horizon.  Section  4.2.1  explains  this 
fact.  For  each  row  level  1,  if  the  condition  above  is 
satisfied,  the  row  level  is  declared  obstacle  free, 
otherwise,  a warning  of  a possible  obstacle  is  issued. 

In  the  walking  phase  (refer  back  to  section  3.2),  audio 
messages  of  "safe  step"  or  "obstacle"  can  be  issued  for 
every  row  level.  In  the  initialization  phase  (section 
3.2),  all  levels  up  to  1*  are  processed  before  the  audio 
message  "your  front  path  is  clear  for  X steps"  is 
issued,  where  X is  a variable  supplying  the  range 
information.  We  should  indicate  that  when  an  obstacle 
warning  is  issued  for  more  than  three  consecutive  row 
levels,  this  is  an  indication  of  a major  obstacle,  and 
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the  walking  straight  ahead  analysis  terminates. 
Avoidance  cues,  if  they  exist,  are  provided,  and  the 
identification  process  is  initiated  via  the  automatic 
access  mode  defined  in  section  3.3.  We  use  three  row 
levels  to  allow  for  "ON"  and  "OFF"  type  of  paths  which 
are  used  as  an  initial  indication  for  the  presence  of  a 
staircase  or  a crosswalk  (see  section  5.4) 

(2)  Initial  information  about  the  object.  This  step  is 

performed  to  provide  initial  knowledge  on  the  extent  of 

the  object,  and  to  determine  if  it  can  be  avoided  or 
not. 

(a)  Determining  the  extent  of  the  object:  Since  we  are 

concerned  with  normal  range  type  of  application  in 
this  case  (e.g.,  ranges  within  10  m),  it  is 
reasonable  to  assume  that  an  object  whose  length 
(for  a flat  object)  or  height  (for  an  upright 
object)  extends  above  three  row  levels  in  range 
constitutes  a large  object,  and  the  straight  ahead 
analysis  should  terminate.  Furthermore,  if  the 
extent  of  the  object  exceeds  the  area  delimited  by 
L1  and  l2»  this  object  is  assumed  to  be  a large 
object,  and  the  direct  access  mode  is  initiated  for 
its  identification. 

(b)  Determining  if  the  object  is  avoidable:  This  will 

depend  on  whether  any  cells,  in  the  same  row  level 
as  the  object,  exist  either  to  the  left  or  to  the 
right  of  the  object,  satisfying  the  condition 
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Gr  - T < G(k,l)  < Gp  + T . If  they  exist,  the 
mid-point  of  these  cells  is  declared  as  a next 
destination.  If  they  do  not  exist,  a dead-end  is 
declared. 

(3)  Making  a left  or  a right  turn.  This  step  is  performed 
when  an  obstacle  is  detected  in  the  direction  of 
travel.  The  two  types  of  turns  considered  are  (a)  a 
turn  to  avoid  a large  obstacle,  and  (b)  a turn  to  avoid 
a dead-end.  In  either  case,  the  system  determines  the 
number  of  consecutive  cells  in  the  row  level  preceding 
the  obstacle  that  satisfy  the  condition 

Gr  - T < G(k,l)  < Gp  + T 

where  k = Lm,  ...,  1 for  a left  turn,  and  k = Lm,  ..., 
N*  for  a right  turn,  where  N*  = N/10  and  denotes  the 
central  column  of  the  grid.  The  mid-point  of  these 
consecutive  cells  is  declared  as  the  next  destination. 
In  the  walking  phase,  the  audio  message  is  "turn 
left/right."  In  the  initialization  phase,  the  audio 
message  is  "turn  left/right  after  X steps." 

(4)  Permissible  left  or  right  turns.  This  step  is  performed 
to  let  the  blind  know  if  a left  or  a right  turn  is 
permissible  for  the  purpose  of  changing  the  direction  of 
travel.  The  safety  condition  is 


Gp  - T < G(k,l  ) < Gp  + T 
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where  k = Lm,  ...,  1 for  a left  turn,  and  k = Lm,  .... 

N*  for  a right  turn.  Next,  determine  the  number  of  row 
levels  satisfying  the  condition  above  for  both  the  left 
and  right  direction.  These  will  constitute  the  left  and 
rignt  clearances  respectively.  The  resulting  audio 
message  is  "you  may  turn  left/right"  in  the  walking 
phase,  and  "you  may  turn  left/right  after  X steps  and 
the  clearance  is  Y steps  wide"  in  the  initialization 
phase. 

Experimental  results  of  the  first-pass  evaluation  using  various  3-D 
scenes  are  shown  in  Figure  4.6.  In  these  results,  we  should  indicate 
that  the  safety  markers,  generated  by  the  vision  system  for  visual 
evaluation,  is  a nonlinear  function  of  range  (refer  to  section  4.2.1). 

Note  that  extensions  of  this  first-pass  evaluation  technique  are 
possible.  For  example,  if  a more  elaborate  scene  analysis  is  desired, 
one  could  implement  the  perspective  effect  on  the  cells  themselves. 
Also,  to  suit  environments  with  curved  paths  as  in  cases  (e)  and  ( f ) of 
Figure  4.6,  a variation  of  the  straight  ahead  analysis  can  provide  the 
desired  path.  In  these  cases,  however,  a least  squares  approximation  is 
required  to  estimate  the  direction  of  the  path.  This  is  fine  for  case 
(e),  but  case  (f)  (a  rare  case  fortunately)  would  require  piecewise 
least  squares  approximations  to  estimate  the  path. 

In  the  next  two  sections,  we  describe  the  perspective  effect  and 

the  range  finding  scheme  which  supplement  this  first-pass  evaluation 
techni que. 
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Figure  4.6  Computer  Results  of  the  First-Pass  Evaluati 
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Figure  4.6--continued 
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4*2.1  An  Implementation  of  the  Perspective  Effect 

The  perspective  effect  used  in  the  first-pass  evaluation  is  based 
on  a simple  mapping  scheme  of  the  real-world  environment  onto  the 
picture  plane.  For  known  camera  parameters,  this  task  reduces  to 
finding  the  geometrical  relationships  between  the  real-world 
measurements  of  range  and  width  with  their  respective  projections  onto 
the  picture  plane.  The  advantages  of  taking  this  perspective  effect 
into  account  are  (a)  the  image  analysis  conforms  to  the  real-world 
representation,  (b)  processing  time  and  results  of  the  walking  straight 
ahead  analysis  are  enhanced  since  only  the  desired  vicinity  of  the  blind 
is  considered  at  any  given  level  of  the  virtually  partitioned  image,  (c) 
if  an  object  is  detected  within  the  path  of  travel,  its  size  is  easily 
estimated. 

To  establish  the  desired  geometrical  relationships,  we  use 
triangulation.  Referring  to  Figure  4.7(a),  we  find  that  the  projection 
(P)  of  a segment  of  length  (r)  between  a point  at  range  Rn  and  another 
point  at  range  Rn+^  is  given  by  the  following  relationships: 


f 


h 


h 


(4.4) 


where 


fh 


h 


(4.5) 


n 


R 


n 


combining  equations  (4.4)  and  (4.5),  we  obtain 


fh  (Rn+1  ~ Rn}  cos  a 

(Rp  COS  a + h sin  a)  (R 


cos  a + h sin  a) 


(4.6) 
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Figure  4.7  Geometry  of  the  Perspective  Effect 
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where  Rp  = L + nr  and  n = 0,  1,  ....  k is  a positional  index  of  the 
range  points  considered.  Parameters  f,  h,  and  a are  as  defined 
before.  Note  that  in  the  case  where  the  tilt  angle  a = 0 i.e.  when 
the  camera  image  plane  is  perpendicular  to  the  ground  plane  which  is 
assumed  flat,  equation  (4.6)  above  takes  the  simple  form 


fhr 


Rn  Rn+1 


(4.7) 


Similarly,  referring  to  Figure  4.7(b),  we  find  that  the  projection 
(P)  of  any  segment  of  width  (w ) is  given  by  the  following  relationship: 


fw 

P(n,w)  = 

L + nr 


(4.8) 


Given  the  camera  viewing  position  and  the  camera  parameters,  using 
the  above  projection  relationships,  the  perspective  effect  is  easily 
determined.  This  actual  perspective  effect  is  then  implemented  to 
conform  with  the  virtual  partitioning  used  in  the  first-pass 

evaluation.  Figure  4.8  shows  an  example  of  an  outdoor  scene  where  an 
actual  perspective  effect  is  contrasted  with  its  implemented  version. 

4*2.2  A Specialized  Range  Finding  Scheme 

The  recovery  of  the  third  dimension  or  the  acquisition  of  the  range 
information  (depth)  constitutes  another  essential  task  in  the  design  of 
a computer  vision  system.  In  recent  years,  various  range  finding 
techniques  have  been  proposed  [43-51].  One  group  of  these  techniques 
[43-46]  uses  the  approach  of  assessing  range  directly  from  its  effect  on 
image  features  such  as  texture,  shading,  occlusion  cues  and  image 
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Figure  4.8  Implementation  of  the  Perspective 
Effect  on  an  Outdoor  Scene 
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projection  of  artificial  light  patterns  cast  on  the  viewed  scene.  The 
other  group  of  techniques  [47-51]  are  basically  a take  off  from  the 
human  visual  system  and  their  approach  is  to  exploit  the  stereo  vision 
principle.  The  stereo  vision  principle  is  based  on  the  measurement  of 
the  disparity  that  exists  between  stereo  images.  These  stereo  images 
are  acquired  either  simultaneously  using  two  cameras  with  a lateral 
displacement  (conventional  stereo),  or  sequentially  using  a single 

camera  and  involving  a forward  displacement  (structure  from  motion, 
optical  flow). 

Experimentation  with  the  first  group  of  techniques  has  been  limited 
to  laboratory  settings  or  to  simple  scenes.  In  theory,  the  techniques 
in  the  second  group  have  the  potential  to  achieve  generality;  however, 
in  practice  these  techniques  face  the  extremely  difficult  problem  of 
image  correspondence.  The  image  correspondence  process  matches 
identical  points  or  locations  in  the  stereo  images  to  determine  their 
disparity  which  through  triangulation  can  be  used  to  infer  the  range 
information.  Although  both  groups  have  made  considerable  progress  in 
the  last  decade,  it  would  appear  that  the  techniques  in  the  first  group 
will  essentially  remain  constrained  to  special  applications.  On  the 
other  hand,  various  indications  of  the  second  group  of  techniques 

suggest  that  it  may  only  be  a matter  of  time  before  a robust  stereo 
vision  algorithm  is  developed. 

In  our  approach  to  the  range  finding  problem,  we  assume  that  the 
paths  that  the  vision  system  is  to  confront  have  ground  surfaces  that 
are  relatively  flat.  This  is  true  for  most  indoor  scenes  and  various 
outdoor  scenes.  We  also  assume  that  the  vision  system  considers  only 
those  image  areas  within  reasonable  range  of  the  blind  at  any  given 


62 


time.  Paths  with  slight  inclinations  that  violate  the  assumption  of 
flat  ground  surfaces  are  accounted  for  by  evaluating  the  range  errors 
due  to  these  inclinations  (see  Chapter  VI).  Paths  that  have  large 
inclinations  can  be  sensed  by  the  blind,  who  then  can  deduce  that  the 
vision  system  is  either  overestimating  the  range  (if  the  path  had  an 
upward  inclination)  or  underestimating  the  range  (if  the  path  had  a 
downward  inclination).  For  this  aspect  of  the  problem,  a more 
generalized  approach  to  range  finding  is  required.  In  Chapter  VII,  we 
suggest  an  approach  to  determine  disparity  from  stereo  images  for  the 
purpose  of  a generalized  range  finding  algorithm. 

The  specialized  range  finding  scheme  described  here  is  devised  to 
conform  to  the  virtual  partitioning  used  in  the  first-pass  evaluation. 
The  objective  is  to  map  each  level  (1)  of  this  partitioning  to  the 
corresponding  range  (R)  in  the  real-world  environment.  Referring  to 
Figure  4.9,  through  triangulation,  this  relationship  is  given  by 

(f  + ID  tan  a)  h tan  (3  + a)  - fh  tan  a 

R-](D)  = (4.9) 

f + (f  tan  a - ID)  tan  (6  + a) 

where  D = pDq  , and  Dq  denotes  the  size  of  the  unit  cell  of  the  virtual 
partitioning. 

Using  the  above  equation  if,  for  example,  the  first-pass  evaluation 
finds  that  a safety  path  is  4 unit  cells  long,  then  the  range  of  this 
path  is  computed  as  R^(D).  Again,  if  we  consider  a = 0 , equation 
(4.9)  above  takes  the  simple  form 

fhL 


R-|  (D)  = 


fh  - 1LD 


(4.10) 
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Note  that  when  1 = 1*  = ( f + f tan  a tan  (3  + a))  / (D  tan  (3  + ct)) 
equation  (4.9)  is  undefined.  This  is  due  to  the  fact  that  the  incident 
ray  of  the  camera  is  now  parallel  to  the  ground  plane.  The  first-pass 
evaluation  terminates  at  this  level.  Note  that  in  order  to  enhance  the 
range  estimation  using  this  approach  one  could  reduce  the  size  of  the 
unit  cell  (Dg)  of  the  virtual  partitioning. 

4.3  Second-Pass  Evaluation  Technique 

The  primary  objective  of  the  second-pass  evaluation  technique  is  to 
provide  a primitive  description  of  the  object(s)  detected  by  the  first- 
pass  evaluation.  This  primitive  description  is  obtained  from  the  binary 
image  or  what  is  also  called  a segmented  image,  generated  using  an  image 
segmentation  technique.  Image  segmentation  techniques  [52-55]  are  basic 
techniques  used  generally  in  the  initial  stages  of  image  processing  to 
facilitate  image  interpretati on  or  object  identification.  This  is  due 
to  the  fact  that  the  result  of  image  segmentation  is  either  an  outline 
of  the  object(s)  in  the  scene  (edge  detection)  or  the  extraction  of  the 
object(s)  (region  extraction)  with  the  irrelevant  information  removed 
(in  this  case  the  ground  or  background  of  the  image). 

The  central  theme  of  the  aforementioned  segmentation  techniques  is 
the  determination  of  the  proper  threshold  or  feature,  based  on  either 
the  gray  level  intensities  or  texture  or  other  image  cues,  which  will 
separate  the  sought  after  object(s)  from  the  rest  of  the  image.  In  our 
approach,  the  region  or  area  of  the  object  is  extracted  from  ground 
based  on  the  gray  level  variation  that  exists  between  the  initial  area 
assumed  obstacle  free  and  the  object  area. 
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The  gray  level  threshold  used  to  separate  object  from  ground  is 
computed  as  follows: 


where  Gp  is  the  optimal  regional  average  gray  level  defined  earlier,  and 
Gobj  is  the  9ray  level  average  of  an  area  within  the  object  area. 

Recall  that  the  extent  of  the  object  has  been  determined  by  the  first- 
pass  evaluation.  With  threshold  obtained,  we  perform  one  of  the 
following  simple  steps  to  extract  object  from  ground,  which  in  this  case 
is  to  generate  a binary  image. 

(a)  If  Gg^j  > 'Gp»  si  1 points  in  the  image  whose  gray  levels 
exceed  Ts  are  set  to  1,  all  others  are  set  to  zero. 

(b)  If  G0|gj  < Gp,  all  points  in  the  image  whose  gray  levels 

are  less  than  Ts  are  set  to  1,  all  others  are  set  to 

zero. 

To  facilitate  the  binary  image  generation  and  to  speed  up  the 

processing  time,  the  focus  is  placed  only  on  the  area  starting  from  a 
point  near  where  the  first-pass  evaluation  has  indicated  the  presence  of 
the  object.  From  this  point,  an  nxn  virtual  grid  is  superposed  over  the 
remaining  area  of  the  image  (see  Figure  4.10).  Since  a binary  image  is 
used  in  this  case,  a different  notation  is  used  for  the  cells  of  the 

grid.  The  cell  which  is  still  a 10x10  array  is  denoted  by  Ck,  k = 1,  2, 
....  n , 1S  the  number  of  the  cell.  Parameter  G(Ck)  denotes  the  average 
gray  level  of  the  cell,  which  in  this  case  ranges  from  0 for  a blank 
cell  (background  cell)  to  100  for  a completely  black  cell  (object 
cell).  With  this  information,  the  following  two  steps  are  performed: 


->  '< 
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Figure  4.10  Second-Pass  Evaluation  Technique 
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(1)  Quantize  the  average  gray  level  values  of  G( ) as 
fol 1 ows : 


G (Ck)  < 10 

10  < G (Ck)  < 35 

35  < G (Ck)  < 55 

55  < G (Ck)  < 75 

G (Ck)  > 75 


Ck  = 0 
Ck  = 0.25 
Ck  = 0.50 
Ck  = 0.75 
Ck  = 1 


(2)  Determine  the  cumulative  cell  values  given  by 
(a)  for  a horizontal  overlay 


= *ml  CHn(,rl, 

(b)  for  a vertical  overlay 


l-i  1,2, ... ,n 


ckl+„(i-i, 


k-i  = 1,2,. . . ,n 


and 


(c)  for  a diagonal  overlay  with  a left  inclination 


^ -j=i  ^kj+(  i -1 )(  n-1 )+(  n+1 )(  n-k^)  ^1  1,2,.  ..,n 


Dr2(kl>  = ^ Ck  j+(  i-l)(  n-1) 


k.  = 1,2,.. . ,n-l 


(d)  for  a diagonal  overlay  with  a right  inclination 


and 


DU(kl>  = C(n+l),'tn(n-krl) 


D12(kl>  ' A C(n+l),-k1 


k ^ 1,2,. ..,n 

k1  = 1,2, . . . ,n-l 
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With  the  above  two  steps,  the  following  assessment  about  the  object 
is  made. 

(1)  If  all  values  of  Hc(l1)  = o ^ = 1,2,. ...n,  this 

indicates  a false  alarm  by  the  first-pass  evaluation 
(assuming  reasonable  range).  This  can  be  the  case  of 

debris  or  dirt  spots  on  the  path  of  travel. 

(2)  Determine  the  general  overlay  of  the  object  by  finding 

Kmax  “ max  ^Hc*  V Drl’  Dr2’  °11’  °12^ 

for  example,  if  Kmax  = Hc,  then  the  object  has  a 
horizontal  overlay. 

(3)  Categorize  the  object  using  the  aspect  ratio  below 

_ max  [Vc,  Drl,  Pr2,  0,,,  D1?] 

p — 

max  [H  ] 

If  > 1.25  : tree  trunk,  fire  hydrant,  light  pole, 

etc. 

If  0.75  < Ar  < 1.25  : square  or  circular  shaped 

object. 

If  l\r  < 0.75  : curb,  step,  bench,  etc. 

(4)  Determine  a more  accurate  range  and  location  of  the 
object  with  respect  to  the  blind  using  a point  in  the 
object  whose  coordinates  (xp,  yp)  are  such  that  yp  is 
the  nearest  point  with  respect  to  the  blind,  and  Xp  is  a 
point  nearest  to  the  center  of  the  path  of  travel.  This 
is  achieved  using  equation  3.2  of  section  3.4. 
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(5)  The  approximate  size  of  the  object  is  estimated  using 
the  following  relation 

Sobj  = (Dobj}  [max  (»c)3 

where 

°obj  ■ Drl'  V 

A technique  such  as  this  can  be  extended  to  include  other  possible 
interpretations  depending  on  the  objects  considered  in  the  application. 
For  example,  if  the  Hc's  = 0 alternate  with  the  Hc's  * 0 this  could 
serve  as  an  indication  of  the  presence  of  a staircase  or  a striped 
crosswalk.  This  last  point  is  made  to  indicate  that  the  primitive 
description  provided  by  the  second-pass  evaluation  can  be  used  as  a 
feature  for  initiating  the  proper  recognition  algorithm  which  then  will 
identify  the  object.  Results  of  a computer  implementation  of  this 

second-pass  evaluation  technique  are  shown  in  Figure  4.11. 
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Figure  4.11 


Results  of  the  Second-Pass 
Evaluation  Technique 


CHAPTER  V 

IDENTIFICATION  OF  SHADOWS  AND  DANGEROUS  OBSTACLES 

A computer  vision  system  which  is  to  provide  enhanced  mobility  and 
safe  guidance  for  the  blind  must  be  able  to  identify  shadows,  detect 
depressions,  and  descriminate  upright  objects  from  flat  ones.  The 
reasons  for  addressing  these  problems  are  very  clear.  Without  the 
capability  of  discriminating  shadows  from  obstacles,  the  blind  person  is 
unnecessarily  forced  to  treat  the  shadow  as  an  obstacle  and  avoid  it; 
undetected  depressions  and  upright  objects  can  caus-e  physical  harm  to 
the  blind,  and  finally  the  capability  of  detecting  upright  objects  helps 
the  blind  to  locate  important  landmarks  such  as  a bus  stop,  a doorway,  a 
mail  box,  or  the  corner  of  a building.  In  this  chapter,  we  develop 

image  techniques  to  solve  these  very  important  problems. 

5. 1 Shadow  Identification 

In  the  2-D  image,  a shadow  is  easily  confused  with  objects,  and 
this  confusion  degrades  the  performance  of  vision  systems.  In  image 
processing  literature,  shadow  identification  has  been  recognized  as  an 
important  feature  for  the  interpretation  of  images  [56-58].  However, 
the  trend  has  been  to  consider  shadow  information  as  a useful  constraint 
rather  than  as  the  problem  itself.  On  the  other  hand,  although 
detection  of  shadow  has  been  identified  as  an  integral  part  of  the 
problem  addressed  in  [18-23],  it  is  either  ignored  or  simply  recognized 

as  a hindrance.  As  a result,  there  is  little  knowledge  on  the  shadow 
probl em. 
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In  this  section,  we  present  our  approach  to  shadow  identification. 
The  focus  in  this  approach  is  to  rely  on  the  characteri zati on  of  the 

inherent  effect  of  a shadow;  we  exploit  the  fact  that  an  image  region 
upon  which  a shadow  is  cast  preserves  its  intrinsic  characteri sties  by 
virtue  of  the  uniform  effect  of  shadow  on  its  gray  level  intensities 
[59].  The  characterization  of  the  effect  of  shadow  is  supported,  in 

this  approach,  by  four  interrelated  analyses,  namely,  (1)  the  histogram 
analysis,  (2)  the  pixel  intensity  distribution  (1-D  intensity  profiles) 
analysis,  (3)  the  correlation  analysis,  and  (4)  the  power  spectral 
analysis.  The  procedure  followed  is  to  apply  each  analysis  in  a 

sequential  fashion  starting  from  (1).  If  the  first  analysis  does  not 
yield  a definitive  answer,  the  second  analysis  will  be  conducted,  and  so 
on,  until  the  last  analysis  is  performed.  In  each  analysis,  performance 
parameters  are  determined,  and  they  are  compared  with  a set  of  empirical 
thresholds  for  shadow  identification.  At  any  stage,  if  a shadow  is 

positively  identified  the  shadow  identification  process  terminates. 

Let  us  assume  that  the  first-pass  evaluation  has  determined  the 
presence  of  an  object  along  the  path  of  travel  and  that  the  extent  of 
this  object,  i.e.  how  small  or  how  large,  is  known.  Moreover,  the 
primary  cues  of  the  decision  making  process  of  the  integrated  vision 
system  suggest  the  presence  of  a shadow. 

We  shall  first  describe  the  approach  for  shadow  identification  for 

the  case  where  the  detected  object  (possibly  a shadow)  covers  a 

relatively  large  area  and  has  a regular  shape.  Next,  we  show  that  with 

few  additional  preprocessing  steps,  this  same  approach  lends  itself  to 

the  case  where  the  object  has  an  irregular  shape.  This  will  cover  the 

case  of  a shadow  cast  by  a tree,  or  a shadow  cast  by  more  than  one 
obj  ect . 
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We  will  now  introduce  the  system  architecture  for  shadow 
identification.  For  each  of  the  four  analysis  in  the  architecture,  we 
first  discuss  their  performance  parameters  and  their  functions.  Next, 
we  describe  the  decision  making  process  that  evaluates  each  of  these 
performance  parameters  to  determine  if  a shadow  has  been  positively 
identified.  We  then  show  how,  through  additional  preprocessing  of  the 
image  upon  which  the  shadow  is  cast,  this  approach  is  made  to  cover  the 
case  of  irregularly  shaped  or  scattered  shadows.  Finally,  we  discuss 
the  limitations  of  this  approach  and  suggest  possible  means  to  overcome 
these  limitations. 

5.1.1  System  Architecture  for  Shadow  Analysis 

The  architecture  of  the  proposed  system  for  shadow  identification 
is  illustrated  in  Figure  5.1.  The  inputs  to  this  sytem  are  selected 
subregions  or  windows  of  the  image  under  consideration.  The  first  of 
these  windows  is  selected  from  the  obstacle-free  area  and  we  call  this 
window  WF.  The  second  window  is  taken  from  the  region  containing  the 
object  and  we  call  this  window  Wg.  The  third  window  is  selected  so  as 
to  enclose  partial  segments  of  both  regions  and  we  call  this  window 
WF0.  The  three  windows  are  assumed  to  be  MxM  arrays.  For  an 
illustration,  refer  to  Figure  5.2(a).  The  four  analyses  performed  by 
the  proposed  system  are  as  follows: 

A.  Histogram  Analysis 

This  analysis  generates  the  histograms  for  windows  Wp 
and  Wg,  and  from  each  histogram  the  following  parameters  are 
derived  to  identify  shadow. 
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(a)  Mean  value:  The  mean  value  of  a histogram  is  given 


k 

y = E g p(g) 

g=0 


(5.1) 


where  p(g)  = Ng/Ny  represents  the  probability  of  a pixel 
with  gray  level  value  g in  the  given  window  containing  a 
total  of  Ny  pixels.  The  parameter  k denotes  the 
quantized  gray  level,  which  in  the  case  of  a positive 
print  ranges  from  0 for  a blank  pixel  to  255  for  a very 
dark  pixel.  In  the  negative  print  this  order  is 
reversed.  This  mean  value  serves  two  purposes:  (1)  to 

detect  the  type  of  shift  that  has  taken  place.  If  we 

define  Up  as  the  mean  value  of  window  Wp,  and  yg  as 

the  mean  value  of  window  Wg,  then  by  comparing 

Up  with  Ug  , we  can  predict  if  the  shift  is  from  a 

light  area  to  a darker  area  or  vice  versa;  (2)  eliminate 
non-shadow  cases  by  comparing  yp  and  uQ  with  the 
minimum  and  maximum  mean  value  (u  • . u ) of  the 
shadow  effect;  this  will  be  discussed  later  in  section 


(b)  Coefficient  of  variation:  The  coefficient  of 

variation  measures  the  dispersion  of  the  histogram 
relative  to  its  central  value  or  mean  and  is  determined 
for  Wp  and  Wg.  This  coefficient  ? is  given  by 


5.1.2. 


a 


U 


(5.2) 
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The  value  of  a , the  standard  deviation,  can  be  used  as 
an  initial  indication  of  the  uniformity  of  the 
considered  window.  A smal 1 value  of  a indicates 
uniformity  in  the  window,  and  a large  value  of  a implies 
non-uniformity  in  the  window. 

(°)  Skewness  coefficient:  The  skewness  coefficient  41 

is  a parameter  which  measures  the  degree  of  asymmetry  of 
the  histogram,  and  is  given  by 


s 


1 

T 

a 


k 

[ £ 
g=0 


(g  - ur  p( g ) ] 


(5.3) 


where  s represents  the  skewness  or  the  third  central 
moment.  This  parameter,  too,  is  determined  for  Wp  and 

w0. 

8-  Pixel  Intensity  Distribution  Analysis 

The  pixel  intensity  distribution  (PID)  is  simply  a 1-D 

intensity  profile  of  a record  (horizontal  scan)  of  the  given 

window  for  a fixed  y coordinate.  For  example,  the  PID  at  a 

given  coordinate  y^  is  given  by  P(x,y|<),  where  x = 0,  1,  ..., 

M-l.  Each  P(x,y)  is  simply  the  gray  level  value  at  point 

( x ,y ) . The  objective  of  this  analysis  is  to  determine 

continuity  in  these  intensity  profiles  going  from  the 

obstacle-free  area  to  the  area  where  an  object  was  found. 

This  analysis  is  performed  for  the  window  Wpg.  The 

underlying  idea  is  that  if  the  area  of  the  object  were 

actually  a shadow,  then  we  would  expect  continuity  in  these 

PID's.  The  procedure  considered  in  this  analysis  is  as 
fol 1 ows : 
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(1)  Select  an  equal  number  of  PID's  from  both  the  obstacle- 
free  area  and  the  area  with  the  object  from  the  window 
WF0.  We  call  these  PID’s  PF(x,y)  and  PQ(x,y), 
respectively. 

(2)  Find  the  pairwise  correlation  R between  the  PF(x,yF)'s 
and  the  Po*(x,yo)'s  using  the  following  equation:' 


M-l 

J PF(x,yF)  P0*(x  - «j,  y0  - «2) 

rpfp0*(W  ■ tn — ; (5.4) 

z PF  (x,yF) 
x=0  h h 


Pi ( x »y -j ) is  simply  the  gray  level  value  at  point  (x,y.j) 
in  region  i.  Parameters  5X  and  $2  are  shifts  in  the 
x,y  coordinates,  respectively,  where  = 2,  4,  ...,  M, 
and  <5^  = 0 to  save  processing  time  in  our  analysis. 
Parameter  Pg*  is  the  normalized  Pg  to  compensate  for  the 
shadow  effect  and  is  given  by  PQ*  = k Pg  where 

k = wPp^  ^Pg  ^ c^an9e  is  from  a light  area  to  a 

darker  area  and  k = yp  / yp  for  the  reverse  chanqe. 

kF 

Parameters  up  and  up  are  the  mean  values  of  Pr  and 

rF  K0  h 

Pg,  respectively. 

After  the  pairwise  correlations  are  obtained,  then 
we  determine  the  maximum  correlations  R*  using  the 
following  equation: 


R*  = max[R  *(5  6 )] 

KF  K0  1 L 


(5.5) 
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C.  Correlation  Analysis 

Determine  the  correlation  between  the  windows  Wp  and 
Wq.  In  this  analysis,  the  following  measure  is  used  to 
identify  shadow: 

Correlation  measure  (R):  The  correlation  R for  any  two 

windows  Wj  and  W2  is  given  by 


R, 


wxw2(  W 


M-l  M-l 

Z Z 
x=0  y=0 


Wj(xty)  W2(x  - fij, 

' M-l  M-l  “ 

Z 2 W (x,y) 
x=0  y=0  1 


(5.6) 


where  Wj  and  W2  are  assumed  MxM  arrays.  W^x.y)  is 
simply  the  gray  level  value  at  point  (x,y)  for  a window 
wi • Parameters  <5^  and  <$2  are  the  shifts  as  defined  in 
step  B.  The  best  correlation  which  is  that  value 

closest  to  unity  is  chosen.  We  determine  values  of  R 
for  each  pair  from  Wp,  WQ,  and  WpQ. 

(D)  Power  Spectral  Analysis 

This  step  is  carried  out  as  a last  resort  to  strengthen 
the  shadow  identification  process  when  the  previous  analyses 
have  all  yielded  ambiguous  results.  One  of  the  attractive 
features  of  the  Fourier  power  spectrum  is  that  its  domain  of 
analysis,  the  frequency  domain,  unlike  the  spatial  domain,  is 
insensitive  to  small  amounts  of  noise  and  to  the  slight 
effects  of  perspective. 

The  two-dimensional  Fourier  transform  of  an  image  window 
W( x ,y)  is  given  by 
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W(u,v)  = F {W( x,y ) } = I [V  V W(x,y)  e'2irj(ux  + V^)/M]  (5.7) 

x=0  y=0 


where  u,v  = 0,  1,  2,  ...,  M-l  are  the  harmonically  related 
frequency  variables  and  j = / -1  . The  power  P(u,v)  and  the 
energy  E(u,v)  spectra  are  given  by 

P(u,v)  = | W( u , v ) | = [R2(u,v)  + I2(u,v)]1/2  and  E(u,v)  = |W(u,v)|2 

where  R and  I are  the  real  and  imaginary  components  of  W, 

respectively.  Various  studies  [60-62]  reveal  the  efficiency 

of  the  Fourier  analysis  in  texture  discrimination.  They  also 

reveal  that  there  is  no  distinct  way  to  compare  power 

spectra.  Suggested  ways  of  comparison  are  the  radial  or 

angular  bins  [60],  or  the  statistical  analysis  of  the  power 

spectra  in  their  two-dimensional  polar  transformation  [61]. 

Our  approach  consists  of  filtering  out  the  low  energy 

components  and  correlating  the  high  energy  components. 

Experimental  observations  reveal  that  it  is  the  high  energy 

components  which  are  most  indicative  of  the  window's 

intrinsic  characteri sties.  The  correlation  measure  used, 

after  filtering  the  low  energy  components,  is  R_  (0,0) 

SFS0 

where  Sp  and  SQ  denote  the  filtered  power  spectra  of  the 
windows  Up  and  Wg,  respectively. 
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5*1.2  Evaluation  of  the  Performance  Parameters 

In  this  section,  we  describe  the  decision  making  process  that 
evaluates  the  parameters  of  the  four  analyses  mentioned  above,  to 
determine  if  a shadow  has  been  identified. 

The  procedure  is  as  follows: 

(1)  Comparing  the  histograms  of  WF  and  Wn: 

(a)  Analyze  the  mean  values  of  Up , yQ  with  respect  to 

Umin  and  wmax  * Parameters  umin  and  umax  are  the 

mean  values  characteristic  of  the  lightest  and 

darkest  effects  of  shadow,  respectively. 

Experimentally  y = 50  and  y = 200  . If 

the  object  area  has  a mean  gray  level  value  smaller 

than  the  minimum  effect  of  shadow  ( y.  < y . ) or 

0 mm'  ’ 

if  the  object  area  mean  gray  level  value  exceeds 
the  maximum  effect  of  shadow  ( yn  > y ) then  a 

non-shadow  case  is  declared  and  the  process  for 
shadow  analysis  ends. 

(b)  If  the  previous  step  does  not  yield  a positive 

identification  then  compare  the  coefficients  of 
variations  (?)  and  the  skewness  coefficients 

('!>)  . Due  to  the  extreme  sensitivity  of  these  two 
measures  to  gray  level  variations,  we  consider  two 
types  of  environments: 

A constrained  environment:  For  an  indoor 

environment  or  a well  controlled  environment,  the 
conditions  | ?p  - ?Q|  < 0. 10  and  |*p  - ^ | 

< 0*10  indicate  a shadow.  On  the  other  hand,  if 
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I Sp  ~ 50l  * 0.20  or  if  and  have 

opposite  signs,  the  process  terminates  with  a non- 
shadow case  declared.  For  values  of 

» 4q»  and  that  satisfy  the  conditions 
0.10  < | cF  - s0|  < 0.20  and  0.10  < | ipp  - ^Q| 

< 0.20  the  comparison  results  are  considered 
ambiguous,  and  step  2 is  initiated. 

For  a non-constrained  environment:  Unless  a 

very  good  correlation  exists  between  the  values 
?P  and  and  between  i|>p  and  <pQ  , step  2 below 
is  initiated.  This  is  due  to  the  fact  that  these 
two  measures  are  extremely  sensitive  to  noise 
present  in  the  outdoors.  Histogram  examples  for 
three  different  scenes  are  illustrated  in  Figures 
5.2(b)  - 5.4(b). 

(2)  Comparing  the  PIP 1 s selected  from  Wpn 

Using  the  value  of  R*  given  by  equation  (5.5),  we 
can  evaluate  the  correlation  that  exists  between  the 
selected  PID's.  The  following  condition 

0.85  < R*  < 1.15  indicates  a good  correlation. 

Condition  |1  - R*|  < 0.30  indicates  no  correlation. 
Values  of  R*  in  between  these  ranges  are  ambiguous.  If 
the  majority  of  the  PID's  (more  than  75%)  correlate, 
then  this  implies  continuity  in  the  surface 
characteristics,  and  a shadow  case  is  declared  and  the 
evaluation  terminates.  If  more  than  50%  of  these  PID's 
do  not  correlate,  a non-shadow  is  declared  and  the 
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Figure  5.2  Shadow  Case-Outdoor  Scene 


Figure  5.3  Non-Shadow  Case-Outdoor  Scene 
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Figure  5.4  Shadow  Case-Controlled  Environment 
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(3) 


process  for  shadow  identification  terminates.  Cases 
between  are  considered  ambiguous,  and  step  3 
initiated.  PID  examples  are  shown  in  Figures  5.2(c) 
5.4(c). 

Determining  the  correlation  of  the  windows  WF,  Wn,  and 

Wed. 


in 
i s 


Using  the  correlation  measure  from  equation  5.6,  we 
evaluate  e(W1,W2)  for  each  pair  of  windows  (W1,W2)  at 
a time,  where  e(W^,W2)  is  given  by 


e(W1,W2)  = |1  - max[Rw  w ( «1,62)3|  (5.8) 

where  e(W1#W2)  is  evaluated  for  each  pair  of  windows 
from  WF,  W0,  and  Wpo.  If  e(wp,WF0)  < 0.20  and 
e(Wpo»  Wg)  < 0.20  there  is  continuity  in  the  surface 
characteristics,  and  a shadow  case  is  declared. 

On  the  other  hand,  if  e(WF,  Wp0 ) > 0.40  and 
e( Wfo ,Wq)  > 0,40  » a non-shadow  case  is  declared  and 
the  process  for  shadow  identifications  terminates. 

Cases  that  range  in  between  the  above  two  conditions  are 
considered  ambiguous,  and  step  4 is  initiated.  The 
various  values  obtained  for  Figure  5.2  - 5.4  are  shown 
in  Table  5.1. 

( ^ ) Determining  the  correlation  between  the  power  spectra 

The  power  spectra  in  question  are  the  Fourier  power 
spectra  with  the  low  energy  components  filtered  out.  We 
call  Sp  the  spectra  of  Wp,  and  Sq  the  power  spectra  of 
Wg.  In  this  step,  using  the  correlation  measure  R,  we 


Table  5.1  Results  of  shadow  Analysis  for  Input 
Images  Shown  in  Figures  5.2  - 5.4 


85 


CJ  <✓> 
<D  -r- 
Q_  tO 

go 


<u  c 
5 < 
O 


cn  co 
o 


to  ro 

tO  r-H 


ro  >> 

<1)  03 

s-  c 
i-  C 
o 

CJ 


r**.  o 

O C\J 


cnj 
CO  CO 


>> 

to 

c 

< 

o 

o_ 


»-H 

cn  cn 


cnj  cn 
c*. 


i-H  C\J 

cn  cn 


T3 

to  CJ 

5 

O 03 
T3  <— 
C QJ 
•*-  C_ 
3 S- 
O 
CJ 


cnj  co 
3 3 


O 

U.  O 


to 

<D 

S- 

3 


cn 

o 


rH  CO 

to  o co 

O O r-t 
I 

cn  co  *3- 

*“ < CNJ  r—i 

o o o 


r>*  to  o 
O CO  co 


CO  to  co 
r*«.  cn  c\j 


Ll.  Lu  o 


3 3 3 


to  CO  o 

r— « *— I O 


O r—i 

CM  UO 


csj  uo 

r-H  i— ( 

• * • 

o o o 


UO  r-H  *— I 

0) 

L. 

3 

cn 


O r-H 

CO  CNJ  CO 


Cn.  CO  CO 
H to  to 


■CJ-  CNJ 
CO  CNi 


i o 
cn  cnj 


U_  Li_  O 


86 


determine  the  value  R_  _ (0,0)  , i.e.  with  no  shifts 
considered.  This  is  a direct  superposition  of  one 
filtered  power  spectra  over  another.  In  this  case,  if 
0,75  S (°»°)  < 1,25  » correlation  exists  and  a 
shadow  case  is  declared.  If  the  condition  above  is  not 
satisfied,  the  overall  approach  for  shadow 
identification  terminates  with  a non-shadow  case 
declared.  This  range  of  correlation  may  seem  large,  but 
experimentally,  it  is  a reasonable  range  and  this  is  due 

to  the  fact  that  power  spectra  have  complex  energy 

spreads.  Examples  of  filtered  power  spectra  are  shown 
in  Figures'  5.2(d)  - 5.4(d). 

5*1*3  Extension  of  this  Analysis  to  Irregularly  Shaped  and  Scattered 
Objects  (Shadows)  — 

In  the  outdoor  scenes,  shadows  are  of  all  shapes.  Shadows  that  are 
cast  by  buildings,  large  man-made  objects,  etc.,  have  in  general  regular 
shapes  and  extend  over  a large  area.  These  can  be  identified  directly 
by  the  above  approach.  Unfortunately,  we  also  have  shadows  that  are 
cast  by  trees,  sign  posts,  etc.,  which  have  irregular  shapes  or  are 
scattered  along  the  path  of  travel.  These  shadows  constitute  a very 

difficult  problem;  however,  it  does  appear  to  have  a solution.  The 

aproach  for  this  problem  involves  a preprocessing  of  the  area  containing 
the  irregularly  shaped  object  (shadow).  This  preprocessing  takes  the 

following  steps: 

(1)  Extract  this  irregularly  shaped  or  scattered  object  from 
the  path  of  travel.  This  is  achieved  using  a similar 
principle  underlined  in  section  4.3,  i.e.  to  find  a gray 
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level  threshold  separating  object  from  the  obstacle-free 
area.  This  threshold  we  denote  by  T$  is  computed  as 

Ts  CGp  + max(Go(k,1))] 

max( G0( k , 1 ) ) is  the  cell  in  the  object  area  that  has  the 
largest  deviation  in  gray  level  from  the  optimal 
regional  average  gray  level  Gr. 

To  extract  the  object,  the  following  points  are 

considered: 

(a)  If  max(G0(k,l))  < Gp,  then  all  pixels  whose  gray 

levels  are  less  than  the  value  T^  are  preserved, 

the  others  are  zeroed. 

(b)  If  max(GQ(k,l))  > Gp,  then  all  pixels  whose  gray 

levels  exceed  the  value  Ts  are  preserved,  the 
others  are  zeroed. 

Steps  (a)  and  (b)  insure  that  this  irregular  shaped 
object  is  extracted  regardless  whether  it  is  lighter  or 
darker  than  the  obstacle-free  area. 

(2)  Once  the  object  region  is  extracted,  divide  each  pixel 
gray  level  value  by  Gp.  If  the  majority  of  these  ratios 
(>  75%)  have  approximately  the  same  value,  call  this 
value  Gopt.  If  not,  the  surface  is  inconsistent  and  the 
process  for  shadow  identification  terminates  since  it 
would  be  difficult  to  carry  out  the  shadow 

identification  procedure.  If  the  value  for  Gopt  is 

determined  then  perform  step  3 below. 
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(3)  Divide  each  pixel  gray  level  value  by  the  value  Gopt. 

This  step  will  make  the  object  area  look  exactly  like 
the  obstacle-free  area. 

Now,  to  make  sure  that  the  gray  level  effect  that  we  have  just 
eliminated  is  indeed  that  of  a shadow,  the  procedure  of  shadow  identifi- 
cation starting  from  step  2 (comparing  the  PID's)  is  reconducted.  In 
this  revision,  the  correlation  results  should  improve.  Computer  results 
on  outdoor  scenes  using  this  procedure  are  illustrated  in  Figure  5.5. 

5.1.4  Limitations  and  Suggestions  for  Future  Research 

There  are  certain  limitations  to  even  the  extended  approach 
described.  Unless  additional  information  is  provided,  the  following 
cases  pose  problems. 

(1)  Narrow  or  small  shadows  - The  problem  in  this  case  is 

related  to  the  fact  that  threshold  T$  cannot  be 
accurately  estimated.  The  extraction  process  is  not 

possible  in  this  case. 

(2)  Shadows  cast  on  surfaces  which  have  random  texture  or 

are  marked  by  various  i rregul arities  in  their  intrinsic 
characteristics  will  disturb  all  the  correlation 

measures. 

(3)  Dark  shadows  - Since  any  form  of  intrinsic 

characteristics  are  cancealed,  this  is  a difficult  task 
even  for  the  human  eye. 

(4)  A shadow  cast  on  an  object  - A shadow  cast  on  an  object 

is  an  object  in  our  analysis  and  cannot  be 

differentiated  from  the  object. 
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(a)  Input  Images 


(b)  Output  Images (Shadows  Removed) 


Figure  5.5  Removal  of  Scattered  Shadows 
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Possible  avenues  to  approach  these  problems  are  (a)  to  use  context 
in  the  scene  [63]  with  the  possibility  of  deriving  the  position  of  the 
light  source  from  a previously  identified  shadow;  (b)  use  knowledge- 
based  pattern  recognition  [64]  by  supplementing  the  identification 
process  by  knowledge  such  as  shadows  are  not  free-standing;  i.e.  their 
contours  always  extend  in  some  direction  towards  the  object  which  cast 

them;  (c)  to  make  use  of  the  information  which  can  be  provided  by  a 

photodiode  which  measures  the  change  in  ambient  light,  or  some 
electromagnetic  device  to  measure  the  reflected  portion  of  the  signal  it 
emits,  where  now  the  blind  has  an  approximate  idea  on  where  to  point  out 
this  device  since  the  object  location  is  determined  by  the  vision 

system,  and  (d)  in  difficult  cases  such  as  that  of  surfaces  with 

irregular  intrinsic  characteristics,  the  only  solution  is  to  determine 

whether  the  object  is  actually  flat  or  not  (this  is  addressed  in  section 
5.3). 


5.2  Detection  of  Depressions 

Depressions  or  drop-offs  constitute  a serious  obstacle  for  the 
blind.  Unfortunately  the  detection  of  depressions  is  also  a complex 
image  analysis  problem.  In  the  human  vision  system,  many  visual  cues 
such  as  stereopsis,  occlusion  cues,  context  in  the  viewed  scenes,  change 
in  textural  properties,  etc.,  are  all  interpreted  and  integrated  with 
relative  ease  to  yield  an  almost  effortless  perception  of  what,  in  fact, 
is  a complex  perceptual  task.  In  image  processing,  however,  a computer 
implementation  exploiting  any  one  of  the  above  cues  becomes  a complex 
information  processing  problem  [65-67]. 
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Clearly,  there  is  no  simple  way  to  solve  this  problem.  In  our 
approach,  we  attempt  to  extract  occluded  information  from  a sequence  of 
frames.  This  is  based  on  the  principle  that  if  one  is  to  approach  a 
depression  or  a drop,  one  is  bound  to  see  new  information  that 
previously  was  occluded.  This  task  which  necessitates  analysis  of  a 
sequency  of  frames  requires  image  correspondence.  In  our  analysis,  the 
constraints  of  the  image  correspondence  process  are  somewhat  relaxed, 
since  we  are  not  concerned  about  the  recovery  of  the  third  dimension; 
instead  we  are  simply  concerned  about  locating,  approximately,  a 
location  of  interest  in  two  different  frames  for  the  purpose  of 
extracting  occluded  information. 

Let  us  assume  that  the  first-pass  evaluation  has  detected  an  object 

along  the  path  of  travel  and  has  estimated  its  range  (Rg)  from  the  blind 

person.  Let  us  also  assume  that  the  shadow  identification  process  did 

not  positively  identify  a shadow.  Given  this  scenario,  the  blind  person 

takes  a few  steps  (a  fraction  of  Rg)  closer  to  the  object  and  takes 

another  picture.  Ideally,  at  this  new  position  which  is  closer  to  the 

object,  if  concurrent  processing  were  available,  then  all  three  major 

analyses  (shadow,  depression,  and  flat  vs.  upright)  should  be  initiated 

simultaneously.  In  the  present  sequential  format  of  scene  analysis, 

however,  the  integrated  vision  system  must  look  for  primary  cues  which 

suggest  the  existence  of  a drop  or  a depression.  We  now  describe  the 

approach  for  the  detection  of  depressions.  The  description  starts  with 

a presentation  of  the  image  correspondence  process  which  allows  the 

system  to  analyze  the  same  location  in  the  two  frames  obtained.  We  then 

describe  the  scheme  used  to  detect  both  the  primary  cues  for  the 

integrated  vision  system,  and  the  occluded  information  for  the  detection 
of  a depression  using  the  two  frames. 
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5.2.1  Image  Correspondence  Process 

For  the  image  correspondence  process,  we  derive  equations  for  the 
correspondence,  in  both  range  and  width,  of  any  two  image  points,  going 
from  one  frame  to  the  next. 

(!)  Range  correspondence:  The  objective  here  is  to  find  how 

two  points,  yjj  and  yj2  , in  the  vertical  axis  of  the 

first  frame  map  into  points,  y^  and  y^2  , in  the 

second  frame.  The  difference  in  the  range  of  the  two 

frames  is  rx.  The  superscript,  1 or  2,  denotes  the 

frame  identity.  Coordinate  yj1  is  the  point  where  the 

object  area  starts,  and  point  yj-  is  an  arbitrary 

point  a small  distance  away  from  yj^  . The  distance 

between  these  two  points  depends  on  the  extent  of  the 

detected  object  (see  Figure  5.6).  For  simplicity,  let 

us  assume  that  the  tilt  angle  a of  the  camera  plane  is 

zero.  To  find  the  mapping  between  points  (y*  vL) 

J 1 1*  J 12' 

2 2 

an<^  ^11*  y-j 2 ) * we  use  equation  3.6  derived  in  section 
3.4  Using  this  equation,  for  a point  at  range  R(yk) 
from  the  picture  plane,  its  projection  yk  on  the  picture 
plane  is  given  by 


yk  = fh  CR(yk)  - L]  / CLR(yk)] 


Applying  this  equation,  we  obtain  the  following 
relationships: 


(5.10) 
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Camera  Camera 

Position  1 Position  2 


/\ 
t \ 

l . i 


(b) 

Figure  5.6  Image  Correspondence  for  Extracting  Occluded 
Information,  a)  Correspondence  in  Width;  b) 
Correspondence  in  Range;  c)  Mapping  the  Refe- 
rence Points;  d)  Occluded  Information  Extraction 
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(c) 


Figure  5.6--continued 
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yU  = fh^(y}i)  - L]  / CLR(yj1)]  ( 

yll  = fhtR(y}i)  - r*!  - L]  / D-Wy^)  - r^]  ( 

where  R(y-|^)  estimated  by  the  first-pass 

evaluation.  Similarly,  since  yj2  is  arbitrarily 
chosen,  its  range  R(yj2)  is  determined  using  equation 
(3.2)  in  section  3.4.  We  thus  obtain 

yi2  = fhtR(y}2)  • L]  / [LR(y}2)]  c 

y12  = fh^(yJ2)  " r1  ~ L]  / CL(R(y}2)  - rx)]  (< 

(2)  Width  correspondence:  In  order  to  save  processing  time, 

it  is  useful  to  focus  only  on  a certain  width  of  the 
image  where  the  object  is  detected.  So,  if  we  choose  a 
segment  of  width  delimited  by  and  in  the  first 
frame,  we  need  to  find  their  corresponding  projections 
in  the  second  frame.  This  correspondence  is  found  using 
equation  (3.3)  derived  in  section  3.4.  By  substituting 
R(y11)  for  f + R(yQ,yi)  , we  obtain  the  relation 

fw(  x|<i  >xk2) 


5.10a) 

i.lOb) 

.11a) 

.lib) 


(5.12) 
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where  the  y coordinate  for  xkl  and  xk2  is  yu.  Thus, 
for  the  first  frame,  we  have 


fW(xki,xJ2) 

R(yji) 


(5.12a) 


For  the  second  frame,  we  have 


Xkl  ‘ xk2  = I (5.12b) 

(R(yu)  - rx) 

Using  relations  5.10  through  5.12,  the  vision  system  can 
locate  the  same  location  in  a subsequent  frame.  Figure 
5.6  illustrates  an  example  of  this  image 
correspondence.  Note,  that  if  more  accuracy  in  the 
results  is  desired,  a pedometer,  which  can  be  interfaced 
to  the  vision  system,  will  provide  an  accurate 

estimation  of  the  value  r^  which  separates  the  two 
frames. 

5*2.2  Extraction  of  the  Occluded  Information 

Occluded  information  that  is  revealed  in  a subsequent  frame  can  be 
perceptually  very  deceptive  (see  Figure  5.7).  This  results  from  the 
fact  that  if  we  cannot  locate  the  same  point  of  reference  in  the  two 
frames  then  we  may  conclude  that  there  is  no  relationship  between  these 
two  frames.  This  stresses  the  importance  of  a reference  point  from 

which  the  system  starts  to  look  for  occluded  information;  in  our 
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Figure  5.7  Need  for  a Point  of  Reference  to  Extract 
Occluded  Information 
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analysis  this  reference  point  is  chosen  in  the  proximity  of  the  object 
as  indicated  by  the  first  pass  evaluation.  Moreover,  the  detection  of 
occluded  information  will  disturb  many  of  the  physical  relationships 
that  previously  existed  between  the  various  elements  in  the  given 
scene.  Utilizing  these  two  ideas,  we  describe  two  simple  methods  to 
extract  this  occluded  information. 

Method  1 

Step  1:  Take  a vertical  scan  from  point  y^  to  point  yj2 

to  generate  a vertical  pixel  intensity  distribution 
or  profile  (PID).  We  call  this  profile  P1 

Similarly,  generate  a vertical  pixel  intensity 
profile,  P^112  , between  y^  and  y*2  . 

Step  2:  Locate  the  number  of  major  disturbances  (peaks)  in 

2 

the  profile  P-|112  when  compared  with  the  peaks  in 
PU12  * If  there  exists  a difference  in  the 
number  of  major  disturbances  in  the  two  profiles, 
then  this  implies  the  detection  of  the  occluded 
information.  By  major  disturbance  or  peak,  we  mean 
a point  in  the  profile  whose  gray  level  value 
exceed  the  value  Pmax  given  by 


for  the  same  profile.  Parameters  up  and  ap  are 
the  mean  and  standard  deviation  of  the  intensity 
profile,  respectively.  The  definition  of  a major 
disturbance  is  obtained  from  study  [68].  This 
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method  is  used  by  the  integrated  vision  system  for 
determining  the  primary  cues. 

Method  2 

Step  1:  Obtain  a few  horizontal  scans  between  points  x1 

1 . kl 
and  of  the  first  frame  starting  at  the 

vertical  coordinate  y^  . We  call  these  intensity 

profiles  pJlk2(i)  » where  i denotes  the  i-th 

scan.  Obtain  similar  scans  from  the  second  frame, 

using  the  coordinate  y^  as  the  starting  point 

and  xkl  and  xk2  as  the  horizontal  limits.  We 

call  these  intensity  profiles  P,2,,„(i) 

k lk2 v ' * 

Step  2:  As  in  step  2 of  the  previous  method,  locate  a 

difference  in  the  number  of  major  peaks  appearing 
2 

in  Pklk2(i)  when  compared  to  P‘lk2(1)  . If 

peaks  are  found,  occluded  information  is 

detected.  This  method  confirms  the  results 
obtained  using  the  previous  method. 

Computer  examples  of  this  procedure  are  illustrated  in  Figure 
5.8.  Note,  that  when  no  occluded  information  is  found  in  this  analysis, 
the  object  remains  a potential  obstacle,  and  the  upright  versus  flat 
object  analysis  described  in  the  next  section  must  be  initiated. 

5*3  Discrimination  of  Upright  Objects  from  Flat  Objects 

Distinguishing  an  upright  object  from  a flat  object  is  essential  in 
the  design  of  a guidance  system.  The  loss  of  the  third  dimension  in  the 
2-D  image  obtained  from  the  camera,  however,  results  in  this  task 
becoming  a complex  image  analysis  problem. 
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Figure  5.8  Extraction  of  Occluded  Information  in  a Depression,  a)  Input  Imane  A 
b)  Close  Range  of  Input  Image  A;  c)  Horizontal  Intensity  Profiles  of 
the  Input  Images;  d)  Vertical  Intensity  Profiles  of  the  Input  Images 
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(c) 


Figure  5.8— continued 
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In  this  section,  we  first  make  a few  experimental  observations, 
contrasting  the  image  projections  of  an  upright  object  with  a flat 
object.  Then,  from  these  individual  projections,  we  show  how  the 
distinctive  characteri sties  of  the  projections  can  be  exploited  to 
obtain  a general  technique  to  solve  this  problem.  In  exploiting  some  of 

these  observations,  we  assume  that  a robust  image  registration  algorithm 
can  be  adopted. 

In  these  observations,  we  let  h denote  the  height  of  the  camera 

from  the  ground  plane,  3 is  the  angle  between  the  central  axis  of  the 

camera  and  the  incident  ray  pointing  at  the  first  point  viewed  by  the 

camera.  These  observations  focus  on  the  vertical  projections  of  the 

objects  on  the  picture  plane.  The  horizontal  projections  are  irrelevant 
to  this  analysis. 

Observation  1:  Upright  objects,  unlike  flat  objects,  are  not 

affected  by  the  perspective  effect. 

Observation  2:  If  the  camera  height  h is  smaller  than  the 

height  of  the  object  (in  the  case  of  an  upright  object)  or 
the  length  of  the  object  (in  the  case  of  a flat  object),  and 
the  angle  3 < 90°  (which  in  our  case  is  always  true),  it  is 
observed  that  the  extreme  edge  points  of  an  upright  object 
will  extend  towards  the  top  of  the  picture  plane  when  the 
camera  moves  towards  the  object.  On  the  other  hand,  the 
extreme  edge  points  of  a flat  object  will  actually  recede 
away  from  the  top  of  the  picture  plane  in  the  same  situation. 
Observation  3:  Let  us  consider  two  objects,  one  flat,  the 

other  upright,  where  both  are  at  an  initial  range  Ri  from  the 
To  stress  an  important  point  in  this  observation, 


camera. 
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the  flat  object  is  chosen  such  that  its  length  is  longer  than 
the  height  of  the  upright  object.  Let  us  denote,  by  If  and 
IU*  the  Picture  plane  projection  of  the  flat  and  upright 
objects,  respectively.  Now,  keeping  parameters  h and  a 
constant,  and  by  moving  the  camera  closer  to  the  two 
equidistant  objects,  we  observe  that  if  at  the  selected 
initial  range  Ri , Iu  > If  then,  as  we  approach  the  objects, 
there  is  a point  Px  with  range  Rl  where  Iu  = If.  From  Pl,  as 
we  move  closer  to  the  objects,  we  now  observe  Iu  < 1^-  (refer 
to  Figure  5.9). 

From  the  first  observation  we  deduce  that  lateral  edges  of  upright 
objects  that  are  straight  vertical  edges  will  remain  straight  vertical 
edges  in  the  image  plane.  From  the  second  and  third  observations,  we 

deduce  that  the  projection  of  an  object  in  the  picture  plane  is 
proportional  to  the  depth  of  field  it  occludes.  From  the  second 
observation,  the  depth  of  field  occluded  from  the  camera  view  is  of 
infinite  measure,  thus  the  object  will  always  remain  in  the  image  plane 
when  the  camera  approaches  it.  From  the  third  observation,  the 

variations  in  the  projections  is  due  to  the  fact  that  the  depth  of  field 
occluded  by  an  upright  object  is  a function  of  the  range  it  is  viewed 
from  (assuming  its  height  is  smaller  than  the  height  of  the  camera). 

5,3,1  Proposed  Technique  to  Detect  Straight  Vertical  Edges 

In  this  technique,  we  make  recourse  to  edge  detection  which  is  a 
fonr,  of  image  segmentation.  There  are  various  edge  detection  techniques 
which  can  be  used  in  this  case  [69].  Since  we  are  only  interested  in 
the  vertical  edges  of  the  image,  we  have  devised  a specialized  edge 
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Figure  5.9  Picture  Plane  Projections 
of  Upright  and  Flat  Objects 
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detection  scheme  which  makes  use  of  the  first  derivative  of  the  gray 
level  intensities.  In  mathematical  terms,  this  first  derivative  is  the 
rate  of  change  of  a mathematical  expression.  In  the  digitized  image, 
this  derivative  reduces  to  the  difference  in  gray  level  that  exists 
between  two  adjacent  pixels.  This  difference  is  used  to  evaluate  the 
type  of  discontinuity  in  intensity  between  the  two  pixels.  A large 
discontinuity  is  a sign  that  an  edge  point  may  exist.  This  is  a purely 
subjective  estimation  of  a large  discontinuity. 

Our  approach  to  detect  the  vertical  edges  comprises  two  steps: 

(1)  The  first  derivative  is  determined,  pairwise,  for  all 
pixels  in  a horizontal  scan  of  each  line  (record)  of  the 
image.  Each  time  the  derivative  which  in  this  case  is 
the  difference  in  gray  level  between  two  pixels,  i.e. 

Igi  ‘ gi+il  exceeds  a set  threshold,  Te,  point  gi+1  is 
considered  a potential  edge  point.  This  process  is 

repeated  for  all  records  of  the  image.  To  make  sure 
that  we  extract  all  potential  edges  the  threshold  Te  is 
set  to  a small  value.  The  range  10  < T < 20  is  a 
good  range  for  the  threshold.  Since  the  gray  level 
range  in  an  image  is  between  0 to  255,  this  first  step, 
with  such  a low  threshold  Te,  is  bound  to  extract 
various  non-edge  (noise)  points.  This  problem,  however, 
is  easily  eliminated  in  the  second  step. 

(2)  In  this  step,  since  the  focus  is  on  the  vertical  edges 
only,  the  system  extracts  only  those  points  with  the 
same  horizontal  coordinate  x.  The  potential  vertical 
edges  will  be  selected  only  if  they  are  at  least  4 
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pixels  long  (in  the  vertical  direction).  All  other 
potential  edge  points  are  considered  noise  points. 

The  results  of  these  two  steps  are  illustrated  in  Figure  5.10.  The 
locations  of  these  upright  objects  are  easily  determined  by  their  (x,y) 
coordinates  which  are  conveyed  to  the  blind  using  the  mapping  principles 
underlined  in  section  3.4. 

5,3,2  .Propped  Technique  to  Assess  the  Picture  Plane  Projections  of 
Flat  uojects  and  Upright  Object? 

The  measurement  of  these  projections  necessitates  the  generation  of 
the  binary  image  to  allow  for  easy  extraction  of  the  vertical 
extremities  which  we  denote  here  by  yi  and  yj . With  reference  to  the 
second  observation,  the  measured  difference  between  the  top  of  the  frame 
and  the  upper  extreme  edge  of  the  object  in  the  image  plane  is  obtained 
using  a vertical  scan.  If  this  measured  difference  actually  decreases 
in  the  next  frame,  the  object  is  assumed  an  upright  object.  In  the  case 
where  the  object  initially  covers  the  whole  plane  of  the  image  in  the 
vertical  direction,  we  can  also  safely  assume  that  the  object  is  an 
upright  object  (e.g.  close  range  image  of  an  electric  light  pole). 

Interpreting  the  results  of  observation  3 leads  to  a complex 
problem.  A solution  to  this  problem  requires  precise  comparison  of  the 
resulting  projections.  It  is  found  that  the  projection  of  a flat  object 
on  the  image  plane  is  proportional  to  its  real-world  length.  As  a 
result,  as  we  approach  a flat  object  there  is  an  increase  in  the 
vertical  projection  of  this  object  on  the  image  plane;  the  increase  is 
directly  proportional  to  the  actual  distance  moved  by  the  camera.  When 
this  relationship  holds  for  an  object,  we  can  deduce  that  this  object  is 
actually  flat.  In  the  case  of  an  upright  object,  however,  this 
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Input  Image  A 


Edge  Detection  Using  Detection  of  Straight 

the  First  Derivative  Vertical  Edges  Only 


(a) 

Figure  5.10  Detection  of  Upright  Landmarks,  a)  Input 
Image  A Example;  b)  Input  Image  B Example 
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Input  Image  B 


Figure  5. 10--continued 


109 


relationship  does  not  hold;  this  is  due  to  the  fact  that  the  picture 
plane  projection  of  an  upright  object  is  proportional  to  the  depth  of 
field  it  occludes,  and  for  an  upright  object,  this  occluded  depth  of 
field  is  a function  of  the  range  which  separates  it  from  the  camera. 
These  results  can  be  verified  using  equation  3.4  in  section  3.4. 

5*4  Special  Case  of  the  Staircase  Example 

What  makes  the  staircase  an  interesting  problem  is  the  fact  that, 
in  general,  what  distinguishes  the  rise  (upright  step)  from  the  tread 
(flat  step)  is  shading.  On  the  other  hand,  the  staircase  is  a 
succession  of  flat  and  upright  steps.  To  use  the  shadow  identification 
process,  would  result  in  flattening  the  staircase,  and  the  upright 
versus  the  flat  object  analysis  would  simply  indicate  that  the  object  is 
an  upright  object.  It  is  for  this  reason  that  we  allow  the  first-pass 
evaluation  to  go  on  for  as  long  as  three  obstacle  warnings  such  as  to 
allow  for  ‘ON’  and  'OFF'  type  of  path  tracing,  which  is  then  used  as  an 
initial  indication  for  the  presence  of  a staircase  (or  may  be  a striped 
crosswalk).  If  the  second-pass  evaluation  has  also  been  used  to  provide 
a primitive  description  of  the  object,  this  primitive  description  can 
also  be  used  to  enforce  the  notion  that  a staircase  may  indeed  be 
present.  With  these  primary  cues,  the  recognition  algorithm  which 
identifies  a staircase  is  initiated.  Referring  to  Figure  5.11,  this 
algorithm  performs  the  steps  below: 

(1)  Obtain  m vertical  scans  on  the  binary  image 

(3  < m < 5)  and  determine  the  rises  r(s,k)  and  treads 

t(s,k).  Parameters  s and  k denote,  respectively,  the 

step  number  of  the  staircase  and  the  vertical  scan  line 
number. 
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Figure  5.11  Scanning  Process  for  the 

Identification  of  a Staircase 


Ill 


(2)  Determined  for  all  values  s and  k the  ratio 


r(s,k)  + t(s,k) 

This  ratio  is  used  in  agreement  with  the  standards  set 
by  the  building  codes  [70]. 

(3)  If  this  ratio  has  about  the  same  value  for  all  m 
vertical  scans,  a staircase  is  identified. 

Striped  crosswalks  are  not  very  common,  but  if  one  desires  to 
include  them  in  the  identification,  steps  (l)-(3)  can  be  repeated  except 

that  the  ratio  R(s,k)  for  a crosswalk  is  much  larger  than  the  ratio 
R(s,k)  for  a staircase. 

To  include  orientation,  the  ratio  = r( s , 1 ) / r(s,m)  can  be 
used  to  identify  the  following  situations: 

ar  < 1 the  staircase  is  to  the  left  of  the  blind 

a 1 the  blind  is  facing  the  staircase 

“r  > 1 the  staircase  is  to  the  right  of  the  blind. 

Computer  examples  are  illustrated  in  Figure  5.12. 
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IDENTIFICATION  RESULTS 

TMC  OBJECT  IS  A STAIRCASE  IDENTIFICATION  RESULTS 

THE  ORIENTATION  OR  THE  OBJECT  Hi 

THE  OIJECT  IS  A staircase 

YOU  ARC  RACING  THE  OBJECT  THE  ORIENTATION  OR  THE  OBJECT  IS: 

THE  OBJECT  IS  AT  YOUR  LEFT 


Figure  5.12  Computer  Results  for  a 
Staircase  Example 


CHAPTER  VI 

ANALYSIS  OF  MOTION  AND  RANGE  ESTIMATION  ERRORS 

In  the  design  of  a computer  vision  system  for  the  blind,  accurate 
mapping  of  the  3-D  real  world  onto  the  2-D  image  plane  is  essential  for 
providing  the  blind  with  adequate  interpretation  of  this  3-D  real  world. 
For  this  purpose,  it  is  of  great  importance  to  identify  the  sources  of 
error  which  could  affect  this  mapping.  In  the  first  section  of  this 
chapter,  we  assess  various  practical  situations  in  the  walking  phase, 
where  deviations  due  to  camera  motion  from  its  fixed  axis  and  the 
occasional  shift  of  the  blind  from  the  desired  direction  of  travel  take 
place.  In  the  second  section,  we  evaluate  the  error  in  range  estimation 
due  to  inclined  paths.  This  evaluation  is  performed  with  respect  to  our 
initial  assumption  that  the  computer  vision  system  is  to  confront 
environments  with  relatively  flat  ground  surfaces. 

6. 1 Motion  Errors 

In  this  section,  we  consider  the  following  deviations  (a)  the  tilt 
angle  <i>c  of  the  camera  (up  and  down  motion),  (b)  the  pan  angle  0c  of 
the  camera  (sideways  motion),  and  (c)  the  occasional  shift  AX  of  the 
blind  either  to  the  left  or  to  the  right  from  the  straight  ahead 
direction,  assuming  X is  the  direction  of  travel.  The  practical 
situations  we  discuss  below  are  based  solely  on  experimental 
observations.  Using  the  above  three  parameters,  one  can  derive 
mathematical  expressions  relating  a point  A with  point  B in  the  process 
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of  motion  using  homogeneous  transformations.  But  then,  since  our 
analysis  is  based  just  on  assumptions,  it  becomes  a futile  task  to  find 
the  corresondence  of  a point  in  the  frame  taken  at  point  A with  that 
same  point  in  the  frame  taken  at  point  B. 

6*1*1  Effect  of  the  Camera  Tilt  Angle  * 

The  tilt  angle  <p^  affects  the  range  estimation.  The  up  motion 
will  result  in  an  underestimation  of  the  range.  Conversely,  the  down 
motion  will  result  in  an  overestimation  of  the  range;  this  with  respect 
to  the  initial  set-up  of  the  camera  system.  To  evaluate  this 
underestimation  or  overestimation  of  the  range,  equation  3.2  derived  in 
section  3.4  is  used  with  a updated  to  account  for  the  effect  of 

c ' 

Effect  of  the  Camera  Pan  Angle  <f>  and  the  Veer  AX  of  the 
Blind 

These  two  parameters  are  used  conj  unctional  ly  for  the  purpose  of 
assessing  two  important  situations. 

(a)  Where  the  deviations  in  9c  and  AX  are  in  the  same 

direction,  as  long  as  the  veer  AX  by  the  blind  is  not 
significant  with  respect  to  the  deviation  of  9 , the 

first-pass  evaluation  will  monitor  such  a situation,  and 

would  redirect  the  blind  back  to  the  desired  direction 

of  travel . 

(b)  Where  the  deviations  in  9c  and  AX  are  of  opposite 

✓ 

direction  an  unfortunate  situation  may  occur,  for  the 
blind  may  be  heading  towards  an  obstacle,  while  the 

camera  is  pointing  to  some  other  area. 
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From  these  deviations,  we  learn  that  it  is  essential  to  have  a 
rigid  (stable)  camera  system,  and  with  that,  it  may  also  require  some 
cooperation  from  the  blind  in  the  picture  taking  process. 

6.2  Range  Estimation  Error  Due  to  Inclined  Paths 

With  the  assumption  of  flat  ground  surfaces,  this  section  derives 
the  range  estimation  error  due  to  inclined  paths,  such  that  they  can  be 
accounted  for. 


6.2.1  Range  Estimation  Error  Due  to  an  Uphill  Path 


Referring  to  Figure  6.1(a),  and  using  the  sine  law,  we  find  that  a 
range  Ruk,  given  an  uphill  path  with  an  inclination  of  angle  a,  is 
related  to  the  actually  computed  range  Rk,  assuming  a path  with  a flat 
ground  surface,  by  the  relation. 


where 


Yk  = 90°  - <j>k  and  <|>k  = arctan  (Rk/h)  - 8 
+ \ " a*  The  range  estimation  error  is  given  by 


and 


cos  <j>. 

eu(a)  = [1 — R 

COS  Uk-a) 

6-2.2  Range  Estimation  Error  Due  to  a Downhill  Path 

Referring  to  Figure  6.1(b),  using  similar  reasoning  as  in  section 

6.2.1,  we  find  that  the  range  Rdk  in  the  downhill  path  is  related  to  Rk 
by  the  relation 

sin  <5, 
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Camera 


(a)  Range  Estimation  Error  in  an  Uphill  Path 


Camera 


Figure  6.1  Range  Estimation  Error  Due  to  Inclined  Paths 


where 

defined 


5k  = 90“  + \ and  sdk  - 90«  - *k  - a.  Angle  ^ is 
above.  The  range  estimation  error  in  this  case  is  given  by 


CHAPTER  VII 

SUMMARY  AND  CONCLUSIONS 


7. 1 Summary 

In  this  dissertation,  we  presented  a computer  vision  approach  for 
guiding  the  blind.  The  design  aspects  of  the  required  computer  vision 
system  are  specified,  and  the  necessary  image  techniques  are  developed. 

In  Chapter  I,  we  indicated  that  the  need  for  research  in  guidance 
aids  for  the  blind  has  long  been  recognized,  but  progress  has  been  slow 
paced.  With  a problem  of  such  importance,  we  believe  that  extended 
joint  efforts  from  all  concerned  researchers  are  long  overdue. 

In  Chapter  II,  we  traced  the  research  progress  over  the  last  fifty 
years  and  found  that  the  majority  of  the  research  efforts  are  aimed  at 
improving  early  concepts  which,  in  our  opinion,  do  not  lend  themselves 
to  the  improvements  that  are  actually  sought.  The  few  studies  which 
have  taken  up  new  research  in  the  field  of  computer  vision  have  been, 
for  the  most  part,  specialized  research  efforts  for  specific  types  of 

applications  which,  we  fear,  cannot  extend  beyond  their  restricted 
domain  of  application. 

In  Chapter  III,  we  investigated  the  design  of  the  computer  vision 
system  for  the  blind.  We  described  the  real-world  domain  that  the 
vision  system  is  to  analyze  and  underlined  the  organizing  principles  of 
such  a domain.  An  integrated  vision  system  was  then  introduced,  which 

must  perceive,  characterize,  and  interpret  the  various  physical 
characteristics  of  real-world  scenes.  To  allow  for  a transition  between 
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the  computer  results  and  the  results  which  are  conveyed  to  the  blind,  a 
section  of  this  chapter  was  devoted  to  determining  the  mapping  of  a 2-0 
camera  image  with  its  3-D  real-world  representation. 

In  Chapter  IV,  the  focus  was  on  image  techniques  which  will  trace  a 
safety  path  for  the  blind  and  warn  him/her  of  oncoming  obstacles.  A 

novel  technique,  the  wide-angle  view,  was  introduced  for  the  purpose  of 
an  optimized  planning  of  the  safety  path.  Image  analysis  for  the  safety 
path  planning  is  based  on  the  surface  consistency  constraint  and  is 
complemented  by  the  perspective  projection  of  the  real  world  and  a 
specialized  range  finding  scheme.  This  safety  path  planning  analysis 
conforms,  to  a limited  extent,  to  the  way  we  humans  view  the  real 
world.  A technique  which  provides  primitive  descriptions  of  encountered 
object(s)  supplements  the  safety  path  planning  analysis. 

In  Chapter  V,  the  focus  was  placed  on  the  development  of  image 
techniques  as  a solution  to  three  very  important  problems  which  are 
shadow  identification,  detection  of  depressions,  and  descriminating 
upright  objects  from  flat  objects.  Our  approach  to  these  problems 
adopted  the  most  essential  features,  in  terms  of  their  computer 
implementation,  that  we  humans  use  to  identify  these  problems.  The 
results  were  found  to  be  very  encouraging. 

In  Chapter  VI,  we  evaluated  the  various  deviation  errors  the  vision 
system  is  bound  to  make,  and  suggested  some  ways  to  minimize  the  effect 
of  these  errors. 


7*2  Significance  of  this  Research 

An  initial  accomplishment  of  this  dissertation  has  been  to  gather 
knowledge  from  diverse  research  areas  to  prove  the  availability  of 
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current  technology  to  provide  the  blind  with  enhanced  mobility.  We  note 

that  we  do  not  advocate  replacing  either  the  guide  dog  or  the  cane,  our 

aim  is  to  bring  forward  a new  alternative  to  the  traditional  guidance 
aids. 

Our  second  accomplishment  is  that  we  have  excelled  the  usual 
research  approach  of  having  to  eliminate  apriori  all  burdens  which  will 
interfere  with  the  attainment  of  the  desired  goal;  rather  than  ignore 
problems  with  inherent  difficulties  such  as  identifying  shadows, 
detecting  depressions,  and  discriminating  upright  objects  from  flat 
objects,  we  have  attempted  to  solve  them.  For  each  of  these  problems, 
we  derived  either  by  means  of  experimental  evaluation,  mathematical 
derivations  or  a combination  of  both,  the  essential  features  that  best 
characterize  these  problems. 

A third  accomplishment  is  that  this  complex  problem  has  been 

effectively  divided  into  simpler  sub-problems  and  represented  by  a 

rather  simple  modular  architecture.  This  simplicity  in  the  architecture 

is  the  result  of  the  ability  to  categorize  the  various  elements  present 

in  the  real-world  environment  by  their  essential  features.  This  modular 

design  has  an  added  advantage,  in  that  it  allows  for  concurrent 
processing. 


7*3  Recommendations  for  Future  Research 

In  order  to  understand  the  limitations  of  image  processing 
techniques  in  general,  one  should  also  understand  the  immense  knowledge 
and  processing  ability  of  the  human  visual  system  [71,72].  In  the 
development  of  the  image  techniques  we  presented  here,  we  make  use  of 
only  a fraction  of  this  knowledge,  and  use  only  that  information  which 
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can  be  put  into  some  symbolic  form.  Unfortunately,  the  point  is  that 
there  is  much  more  ground  to  cover.  In  the  specific  problem  areas  that 
we  have  addressed  here,  we  recommend  that  further  research  be  carried 
out  on  the  following  areas: 

A.  The  surface  consistency  constraint  which  is  based  on  the 
coherence  of  matter  is  a fact,  but  with  this  fact,  one  is 
also  to  consider  that  the  outdoor  world  is  a complex  and 
noisy  world.  With  this  issue,  it  appears  that  a general 
aproach  to  the  problem  should  include  further  knowledge  on 
the  modelling  of  the  way  light  is  reflected  by  surfaces 

[73.74] ,  and  on  the  perception  of  surface  orientation 

[21.75] . 

B.  The  utilization  of  two  or  more  cameras,  which  will  have  many 
advantages. 

(a)  Any  analysis  requiring  image  correspondence  is  greatly 
facilitated  since  the  distance  between  the  two  cameras 
and  their  positional  relationships  are  apriori  set. 

(b)  Stereo  vision  becomes  available  without  recourse  to  the 
optical  flow  approach. 

(c)  In  the  case  of  the  detection  of  depressions,  extraction 
of  the  occluded  information  is  facilitated.  This  can  be 
achieved  by  simply  pointing  one  camera  to  the  ground 
plane  at  an  angle  ( 61 ) smaller  than  the  angle  (g  ) 
of  the  second  camera.  This  way  as  the  vision  system 
moves  closer  to  a depression,  in  case  the  camera 
pointing  to  the  ground  with  the  angle  @2  misses  to 
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detect  the  occluded  information,  the  camera  with  angle 
will  detect  this  occluded  information. 

C.  An  easy  approach  for  determining  disparity  in  stereo  images 
should  be  found  to  yield  a simpler  and  efficient  process  of 
recovery  of  depth  or  range  information.  Rather  than  follow 
the  traditional  approach  of  finding  the  same  point  or 
location  in  the  two  images,  maybe  we  can  use  some  correlation 
measure  on  partial  areas  of  the  stereo  images  to  determine 
the  shift  (disparity)  that  has  taken  place  between  the  stereo 
images.  For  example,  we  can  extract  either  the  vertical 
edges  or  the  horizontal  edges  that  exist  in  the  two  images, 
then  convolve  these  two  images  and  find  the  point  where 
maximum  correlation  is  attained.  If  we  use  the  correlation 
function  defined  in  section  5.1.1,  this  translates  in  finding 
the  shifts  &l  and  6^  in  the  x and  y coordinates, 

respectively  where  this  correlation  function  is  closest  to 
unity.  With  the  parameters  and  , the  camera 

parameters,  and  the  cameras  lateral  displacement,  range 
information  can  then  be  obtained  by  simple  triangulation. 

D.  A feasible  approach  should  be  determined  to  insure  harmony 
between  the  conveying,  by  the  computer,  of  the  audio  results 
with  the  walking  of  the  blind. 

The  aspects  of  portability  of  the  computer  vision  system 
should  be  investigated  with  regards  to  the  processing  unit 
only.  Other  elements  of  these  vision  systems  are  either 
readily  portable  or  can  be  repackaged  for  portability. 
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F.  The  multiple  processing  concept  should  be  implemented,  where 
each  processor  is  to  be  dedicated  to  a specific  subtask  of 
the  integrated  vision  system.  For  example,  we  can  have  a 
processor  dedicated  for  the  path  planning,  another  for  shadow 
identification,  etc.  A central  processing  unit  will  control 
the  information  processing  of  these  specialized  processors. 
The  various  algorithms  such  as  the  fast  Fourier  transform 
(FFT),  histogram  generation,  edge  detection,  etc.,  should  be 
implemented  in  cellular  VLSI  architectures  [29,76,77].  The 
performance  using  such  implementations  are  very  impressive. 
For  example,  an  N samples  FFT  which  takes  time  in  order 
0 (N  log2  N)  on  a serial  computer,  is  performed  using  the 
parallelism  of  a butterfly  or  shuffle  exchange  network  in  the 
order  0 (log2  N)  . As  another  example,  a histogram  of 
256x256  image  on  a POP  11/40  minicomputer  takes  about  5 secs 
to  generate.  This  same  histogram  is  generated  in  2 msec 
using  cellular  VLSI  architecture. 

G.  The  following  important  points  should  be  investigated. 

(a)  The  utilization  of  color  as  an  additional  feature  to 
enhance  image  interpretation. 

(b)  The  inclusion  of  a compass  to  enhance  the  guidance  of 
the  blind. 

(c)  The  inclusion  of  a control  panel  for  blind-vision  system 
interaction. 

(d)  The  integration  of  other  forms  of  sensing  devices  to 
complement  the  camera. 
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In  retrospect,  it  is  clear  that  in  the  design  of  a computer  vision 
system  for  the  blind  one  is  faced  with  many  complex  issues  that  are  yet 

to  be  resolved.  But  then  again,  there  have  seldom  been  easy  answers  to 
important  problems. 

After  few  years  of  research  in  this  field,  we  came  to  believe  that 
a solution  to  this  probl em- exi sts.  But,  this  solution  would  be  reached 
only  if  genuine  efforts  are  extended  in  this  research  direction. 

We  hope  that  many  well  intentioned  researchers  will  take  up  this 

challenging  problem  in  an  effort  to  help  the  blind  'see'  more  of  this 
world. 
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